Speech recognition is huge. The technology has been growing at an explosive pace in recent years. Indeed, tech majors like Microsoft, Apple, Amazon and IBM are all hard at work, developing and integrating voice-tech within their systems. In a major breakthrough in the field, Microsoft has left IBM Watson behind, achieving the lowest error rate for speech recognition on record.
The results were obtained during a recent benchmark evaluation test against the industry standard Switchboard speech recognition task. Microsoft’s speech recognition systems managed to complete the test with merely a 6.3 percent word error rate (WER).
WER, in case you are unaware of it, is a commonly used metric of the performance of voice based systems, including speech recognition or machine translation. Until now, the IBM Watson had the lowest WER of 6.9, although IBM had at a recent interspeech conference claimed achieving 6.6 percent error rate. However, Microsoft’s new systems are even better.
According to a research paper published by Microsoft scientists on the topic,
Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set. We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data.
Definitely a huge, huge improvement when you stop to consider the fact that only twenty years ago, the best published research system had a WER of greater than 43 percent. The last few years in particular, have seen a huge surge of interest in the technology with everyone competing to develop the most accurate and precise recognition systems in the world. The research done in the process, contributes towards the creation of new products and the imporvement of those that are already in the market.
Who knows, Microsoft may well use the enhanced technology to further improve its very own, AI powered personal assistant — Cortana.
However, these systems still have a long way to go before they can actually replace text as the primary mode of communication between humans and machines.
As per a report from Kleiner Perkins analyst Mary Meeker, speech recognition needs to reach roughly 99% accuracy before we can hope to deploy it as the primary means of human-AI communication. So, even the best of our systems still have around 7-10 percent to go. However, voice tech is growing exponentially and the extreme levels of interest exhibited in the technology by some of the largest corporations across the world means that, the day when voice finally replaces text, could well come sooner than we expect.
Meanwhile, we watch as voice continues to slowly increase in both its reach and usefulness. With Apple, Microsoft and Google just a few of those on the bandwagon, and with Amazon flooding the market with its voice controlled echo, dot and tap devices, voice is increasingly expressing its dominance as the next mode of human-machine interaction.