Q10 – Can you build any combination (for example Chinese or Japanese into Spanish or Russian)? What are the challenges?

This is the greatest advantage of statistical systems. All you need is data, no linguistic knowledge of how language A relates to language B. If you are building “rules” between Japanese and Chinese and any European language, you are facing a tough task. Transfer rules are more and more remote between non-related languages. But with a statistical system, your engine analyzes the changes of a word or series of words happening when other expressions happen in other languages.SMT systems also work very well with similar or “related” languages, as little reordering is needed. When we are dealing with very remote languages, peripheral processes, pre-processing and post-processing become very important, as well as word reordering (i.e. making the sentence flow). How the Language Model is built is also important, but the key is really a good set of pre-processing and post-processing.

The answer is thus, yes, any language combination can be built and much faster and efficiently than with rule-based systems.