Machine Translation Systems
If you translate over 100,000 words in a year into a specific domain, we recommend you to consider the possibility of adopting and growing your own machine translation system. You probably own enough data to create your own language model and we can train a specific engine. All you need is machine translation consultancy.
You may think this is an expensive development exercise compared to standard CAT or out-of-the-box rule-based systems. It needn’t be.
Consider the hidden costs and known inefficiencies of CAT solutions:
- Production will always be dependant on TM matches and similarities.
- No quick turn-around, not even for gisting.
- Human production limit.
- Built-in translator-dependent inefficiencies (typing, thinking, terminology researching).
- License costs.
- 3rd party technological dependency.
- A Machine Translation system can provide results where a TM/CAT tool gives out 0% match (and you end up paying for a new translation).
- Higher “per capita” language output as a result of fast initial output and post-editing.
- No human time wasted typing translations.
- No human time wasted thinking translations.
- Human time spent in improving translations.
- Much higher per capita production.
- One-time fee: you will own your customized system. Engine updates are simple and cost very little in terms of time and money. We can train your own personnel (at computer programmer’s level) to re-train the engine or re-train it for a very small fee.
In our experience, old CAT tools have been converted into post-editing tools, leveraging Translation Memory results that are combined with machine translation input.
Rule-based systems may be suitable for applications with very specific and controlled language, in language combinations that are not too remote and where analogies can be drawn. Rule-based systems are most beneficial when you can establish A = B equivalents. They are not recommended in applications where a certain degree of flexibility is required or in mixed environments.
Please ask our consultancy team for a details, necessary data and a quote.
-> Read more in Services Description
The saying goes “Garbage in, Garbage out”. Noisy data is one of the bigges problems for all machine learning tasks. Machine Translation is no different. When we talk about “dirty data” or “noise” in machine translation, we do not jsut mean bad alignments (sentences that do not mean exactly the same), poor translations, misspellings, sentences with missing items and several inconsistencies that affect parallel learning from the data set used to train the systems. Statistical machine translation (SMT) systems are good at memory and are less affected by a couple of bad alignments as statistics take over and prioritize what is provided as “good” several times. In this sense, they are more “bullet-proof” and robust, and can cope with up to 10% noise in the training data without significant impact on translation quality.
Therefore, in the case of statistical machine translation systems, it can be said that the more data, the better, even if a bit noisy. However, the same cannot be said of neural systems. According to a recent paper (Khayrallah and Koehn. 2018), the same can not be said for neural machine translation, which is much more sensitive to “dirty segments” or noise as it tries to mimic human speech much closer.