Machine Translation Systems
Our origins as a machine translation company date back to the 2nd release of the statistical Moses translator in 2009. Pangeanic was the first company in the world to implement Moses commercially -as reported by EU FP7 Research Programme Euromatrixplus. It was designed from the very beginning to serve open standards workflows (inputs and outputs in TMX, xliff). Nowadays, all our engines are created using neural networks and offer near-human machine translation quality.
If you translate over 100,000 words in a year into a specific domain, we recommend you to consider the possibility of adopting and growing your own machine translation system. You probably own enough data to create your own language model and we can train a specific engine. All you need is machine translation consultancy.
You may think this is an expensive development exercise compared to standard CAT or out-of-the-box rule-based systems. It needn’t be.
Consider the hidden costs and known inefficiencies of CAT solutions:
- Production will always be dependant on TM matches and similarities.
- No quick turn-around, not even for gisting.
- Human production limit.
- Built-in translator-dependent inefficiencies (typing, thinking, terminology researching).
- License costs.
- 3rd party technological dependency.
- A Machine Translation system can provide results where a TM/CAT tool gives out 0% match (and you end up paying for a new translation).
- Higher “per capita” language output as a result of fast initial output and post-editing.
- No human time wasted typing translations.
- No human time wasted thinking translations.
- Human time spent in improving translations.
- Much higher per capita production.
- One-time fee: you will own your customized system. Engine updates are simple and cost very little in terms of time and money. We can train your own personnel (at computer programmer’s level) to re-train the engine or re-train it for a very small fee.
In our experience, old CAT tools have been converted into post-editing tools, leveraging Translation Memory results that are combined with machine translation input.
Rule-based systems may be suitable for applications with very specific and controlled language, in language combinations that are not too remote and where analogies can be drawn. Rule-based systems are most beneficial when you can establish A = B equivalents. They are not recommended in applications where a certain degree of flexibility is required or in mixed environments.
Please ask our consultancy team for a details, necessary data and a quote.
-> Read more in Services Description
The saying goes “Garbage in, Garbage out”. Noisy data is one of the bigges problems for all machine learning tasks. Machine Translation is no different. When we talk about “dirty data” or “noise” in machine translation, we do not jsut mean bad alignments (sentences that do not mean exactly the same), poor translations, misspellings, sentences with missing items and several inconsistencies that affect parallel learning from the data set used to train the systems. Statistical machine translation (SMT) systems are good at memory and are less affected by a couple of bad alignments as statistics take over and prioritize what is provided as “good” several times. In this sense, they are more “bullet-proof” and robust, and can cope with up to 10% noise in the training data without significant impact on translation quality.
Therefore, in the case of statistical machine translation systems, it can be said that the more data, the better, even if a bit noisy. However, the same cannot be said of neural systems. According to a recent paper (Khayrallah and Koehn. 2018), the same can not be said for neural machine translation, which is much more sensitive to “dirty segments” or noise as it tries to mimic human speech much closer.
Industries that apply our Machine Translation Technologies
We work for the following industries :
- Electronics / Computer Hardware and Peripherals
- Computer Software
- Legal-Professional services
- Life Science / Medicine
PangeaMT’s Machine Translation System and Formats are modular. It is a widely implemented, successful machine translation system serving millions of words a day in completely private environments, processing documents into more than 300 language combinations, via API and our document translation app PangeaBox.
With PangeaMT, you can create and add new translation engines with your own material in the language pairs you have purchased and also create language domains of your choice for your own engine. PangeaMT is constantly experimenting and adding more domain-specific engines. thanks to its powerful API, translation is served in milliseconds to the environment of your choice.
One of the main benefits of PangeaMT is that your engines are held in a totally private environment: in your server. If you are looking for cloud solutions, do not hesitate to send an email to our sales team.
Neural machine translation is not perfect, but well customized and used, can produce amazing results
Machine Translation Formats
The output from PangeaMT is often praised for its efficient handling of tags, unlike other plain-text solutions. This was already reported in 2010 and 2011 at Localization World and TAUS by several clients, like Sybase and Sony Europe. Currently, PangeaMT is fully compatible with older versions of Trados, and with current versions of SDL Studio, MemoQ, MemSource, as well as most formats supported by the Okapi Framework (xliff, InDesign idml, FrameMaker mif, html, all LIbreOffice formats, etc).
When using our translation API for translation, text can be extracted from your website and sent directly to one of engines and returned to your CMS for publication and forwarded to a human post-editor. Contact PangeaMT to find out more how we can set up a plugin and connector to your CMS.
Machine Translation Engine Training
We recommend you prepare a terminology interface /consistency-checking application and terminology lists for the first training. We can look for and mine data mining from our over 1,2 billion aligned sentences. If post-editing at Pangeanic, we will require an approved terminology list to ensure terminology consistency. If you translate over 100.000 words in a year into a specific domain, we recommend you to consider your own neural machine translation development, which can be run in any CPU server. Please ask our consultancy team for a details, necessary data and a quote.