If you have been tracking progress of voice assistants offered by Apple (Siri), Microsoft (Cortana) and Google (Google Now) lately (Fb is too coming up with something soon), you’d have a fair bit of idea that Google Now has probably been the more left behind one.
However, to beef up its lagging voice assistant and offer users with more accurate results, Google has released a few results, stating that its voice assistant is now more powerful, faster and even more accurate.
Back in 2012, Google had announced that its voice search service had adopted Deep Neural Networks(DNNs) as the core technology used to model the sounds of a language. These replaced the 30-year old standard in the industry: the Gaussian Mixture Model (GMM). In this freshly released update today, Google says it has advanced the tech behind DNN to a more advanced level.
Today, we’re happy to announce we built even better neural network acoustic models using Connectionist Temporal Classification (CTC) and sequence discriminative training techniques. These models are a special extension of recurrent neural networks (RNNs) that are more accurate, especially in noisy environments, and they are blazingly fast!
According to Google’s critically detailed research post, Google Now’s improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before.
Try saying it out loud – “museum” – it flows very naturally in one breath, and RNNs can capture that.
With Connectionist Temporal Classification, the models are trained to output a sequence of “spikes” that reveals the sequence of sounds in the waveform. They can do this in any way as long as the sequence is correct.
In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries. Google says that its new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices.
For a more detailed understanding, you can visit Google’s Research Blog post here.