Though Google Translate is one of the most powerful language translation tools, the company still thinks there’s room for major improvement. And it is now working towards creating a model which can translate phrases from one language to another automatically. Much like every other product, Google has been working on integrating machine learning translation techniques into this system as well. And today seems to be the day, we can finally see it in action.
Google Neural Machine Translation system, or GNMT which utilizes state-of-the-art training techniques for improved translations has today been introduced into one of the most difficult language pair: Chinese to English. Let’s take a brief look into these complex machine learning models, and how they’re making complex translations easier.
Though phrase-based machine translation algorithms used to be the core of the translation service, the complexity of the same has evolved. For those unaware, phrase-based machine translation (PBMT) breaks an input sentence into corresponding words and phrases that’re translated independently. Each word of the input sentence will first broken down and translated, before joining them to form the final output sentence.
But, the research team has now started tinkering with Recurrent Neural Networks(RNN) to directly understand the mapping(or connection) between the input and output sequence. Also, here the input sentence is in one language while the output one is another. This drives us into our next and most important technique — that is being employed in today’s release — Neural Machine Translation (NMT).
We can all agree on the fact that we would require more computational power to parse more complex sentence — and even more for a composite language like Chinese. And this is where Google’s new and upgraded Neural Machine Translation system(or NMT) comes into play.
NMT, as stated in the blogpost, considers the entire input sentence as a unit for translation and this requires fewer engineering design choices as compare to PBMT. The research team has upgraded the power of NMT to work on very large data sets, and provide sufficiently fast and accurate translations to Google’s users and services.
The technical research paper gives us insight into several advances that reduce the computational overhead of the complete system as-a-whole. GNMT improves on the recognization of rare(or uncommon) words — that it previously got stuck at — by further breaking them into smaller blocks and treating them as individual translations.
The computational time has been reduced by using custom hardware and Tensor Processing Units for neural network training. The input-output process of the system is quite the same as earlier, but now forms a more unified in-out process. GNMT is the most advanced and so far the most effective tool to leverage machine learning in translation.
The image underneath gives us a peek into the translation procedure of a Chinese sentence to English using GNMT.
As the sentence enters the neural network, the Chinese sentence is broken down with its words encoded as a list of vectors — where each vector represents a single word. Once the complete sentence is read the decoder starts translating each word into English and it churns out the most suitable translation by paying close attention to a weighted distribution over the Chinese vectors relevant to that word.
Also, GNMT reduces translation errors by more than 55 to 85 per cent on several major language pairs, and makes a step closer to human-level accuracy. In lieu of today’s translation model release, more than 10,000 language pairs are currently supported by Google Translate and the team is working to roll it out to many more pairs over the coming months.