In an age of unparalleled digital transformation, Machine Translation is fast becoming the go-to choice for many companies and brands across the globe. As consumer demands grow, businesses seek to rival competition with localised content and optimum online presence. New computational advancements in Machine Learning, promise improvements in quality, and a boost in the productivity of post-editors; a key development towards a shift to automatisation that exists within the translation industry today. But what exactly do we mean when we refer to Machine Translation, and how has continued investment in Artificial Intelligence technologies shaped the landscape for translation service providers?

Put simply, Machine Translation (MT) is translation carried out by a computer. A process in which text or speech is rendered automatically from one language to another. An MT engine will pool samples from a bilingual data set and use it to complete a translation. Early examples of Machine Translation date back to the 1950’s, where a public demonstration, commonly known as the Georgetown experiment, exhibited a basic Russian to English machine translation system. Perhaps becoming one of the first non-numerical applications for computers. Today’s MT engines have far surpassed the basal attempts of the Georgetown experiment. Continued investment and investigations in translation software have proven that, contrary to the belief of the project’s researcher’s, Machine Translation would in fact not be solved within three to five years. In actuality, Machine Translation has expanded into a complex and continuously-evolving sub-field of computational linguistics.

#sarcasm

Translation is never a mere word-for-word substitution. Every language is built upon an individual set of rules and structures. Each enriched with its own syntax, idioms and meaning. In effect, the intricacy of language correlates directly with the perplexing difficulties faced whilst translating. An effective translation must restore the meaning of a text from the source language to the target language. For instance, take sarcasm, a bitter and cutting form of expression that, in the right context can be considered quite humorous (or insulting). Many automated translation systems are baffled by sarcasm. Precisely because, a sarcastic turn of phrase tells us the opposite of what is actually meant, resulting in an awkward utterance often conveying an out-of-place sentence.

Researchers in the Technion-Israel Institute of Technology Faculty of Industrial Engineering and Management have developed a system for interpreting sarcastic statements on Twitter, called SIGN, (sarcasm Sentimental Interpretation GeNerator). SIGN was fed with a dataset of 3000 Tweets marked with the content tag #sarcasm, along with a parallel corpus of non-sarcastic Tweets, which were interpreted by human comedy writers and literature paraphrasers. SIGN capitalises on sentiment words to detect sarcasm, i.e. “I love Mondays, #sarcasm”, “love” being the indicator of sarcasm. It also borrows algorithms and evaluation measures from Machine Translation, ultimately producing interpretations that retain the meaning of the original text, i.e “how I hate Mondays”. Although SIGN is the first system to be able to interpret sarcasm in written text, there is still scope for development if our intention is not only to detect sarcasm, but additionally translate it with MT based systems.

Where we’re going we don’t need roads

Although still imperfect, Machine Translation is fast evolving into a powerful tool. Recent technological developments in Machine Translation can be owed to a spike in demand, with an estimated growth in the Machine Translation market to reach close to a billion dollars by 2020. An explanation for the growth in a demand, lies in the strive to reduce customer information gaps, along with customers’ expectations for quality translations in their own language. A survey performed by Common Sense Advisory, a market research company, found that out of 3000 online customers across the globe, 75% said they preferred to buy products in their native language. 60% of these claimed they rarely or never bought from English-only websites. Another reason for a bolstered growth in the Machine Translation market is pertained to the inferior price of MT relative to conventional human translation. As companies attempt to nudge away competition with more localised content, it is no surprise that translation buyers are shifting from human to cost-effective automated translation.

However, automated translations systems do not signify the end of human translation. MT systems tend not to offer consistent accuracy, and are often unable to solve ambiguity; meaning that a highly reliable translation output is likely to require the assistance of a human post-editor. However, constant advancements in AI, data analysis and computational capacity are carving the way for new technologies and developments in MT. Neural Machine Translation (NMT), a deep-learning system, with enhanced statistical analysis of text, is reportedly said reduce translation errors by an average of 60%. A NMT system processes information in a way that is more more alike our own biological nervous systems. The NMT paradigm will utilise an Artificial Neural Network that will “learn” relationships between two language systems. NMT networks can also recognise patterns, such as handwriting and be configured to classify data, shapes and images.

Race to the top

As neural technology tends to perform better than standard MT systems, more efficient and idiomatic outputs mean that post-editors benefit from a boost in productivity.  PangeaMT, Pangeanic’s own Machine Translation and Artificial Intelligence technology division, creates hybrid Neural Machine Translation engines that offer cognitive companies, institutions and professional translators, the ability to build their own Neural Machine Translation solution. Neural technology allows MT to be flexible, and adapt according to the context. As NMT continues to become used more widely, industries will reap the benefits across the globe. Translation companies and providers of Neural Machine Translation technology such as PangeaMT, race to develop the most elevated NMT systems, bridge language gaps and surpass power players like Google.

Overall, Machine Translation as we know it has undoubtedly come a long way from the Georgetown machine with its six grammatical rules and 250 words. Things are changing fast, as advancements within the field take off and Artificial Intelligence enhances the breadth of possibility, better training of NMT systems could eventually encourage an AI-powered translation system that is effective in delivering quality and reliable translations. There is still much research and further technological breakthroughs to come about, but it is undoubtedly an exciting time for the MT industry indeed.

About Carolina Herranz-Carr

Carolina Herranz Carr is a Data Analyst at Pangeanic and a member of the NEC TM Data consortium, a pan-European data sharing awareness program funded by the European Commission. Carolina is natively bilingual and has a vast interest in language, politics and business. Previous experiences include participating in 3 international European Youth Parliament sessions, and Project Managing 20,000 capacity events.