In the ongoing NEXT cloud conference in San Francisco, Google has come up with some pretty interesting announcements — two of the most eye-catching ones being announcement of a new machine learning platform and complete access to its speech recognition API.

To know details of the former, you can tune in to Mir’s detailed article here. To know more on the latter, stay here with me.

As of now, developers used to heavily rely on the likes of Nuance and others for getting speech recognition into their apps. And while these services have been faring decently well, it was actually a matter of time before Google opened gates to its much better on tech, affordably priced, massively data-backed speech recognition API.

At its NEXT conference today, Google confirmed providing complete access to the API. The Google Cloud Speech API — as it is being called — has been launched in a Limited preview mode. The API enables developers to convert audio to text by applying Google’s powerful neural network models in an easy to use API.

The other, more significant edge which Google’s Cloud Speech API will have over current market leaders, is the fact that the API recognizes over 80 languages and variants, to support an app’s global user base. And more importantly that supporting a global user base, the API will help regional developers — specially here in India — make apps which recognise multiple dialects which users use to speak upon.

Developers can transcribe the text of users dictating to an application’s microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and in upcoming releases, integrate with your audio storage on Google Cloud Storage.

The API would also provide developers with Google’s precisely effective real-time, speech-to-text display engine. Speech API can stream text results, returning partial recognition results as they become available, with the recognized text appearing immediately while speaking. Alternatively, Speech API can return recognized text from audio stored in a file.

And with all of that, you get backing from Google’s powerful machine learning engines. Developers can apply the most advanced deep learning neural network algorithms to your users’ audio for speech recognition with unparalleled accuracy. Speech API accuracy improves over time as new terms are introduced and usage grows.

The announcement though, isn’t completely surprising. The search giant has been throwing hints for quite some time now over its intent of making Speech recognition available to developers in some way. The company for example, recently rolled out the ability to simply speak and edit documents in Google Docs, along with the announcement of a Voice Interaction API at Google I/O in 2015, which allows Android developers to add voice interactions to their apps. Providing un-interfering access to the entire API was the last stepping stone, which has now been laid.

As for pricing, Google has initially kept it for free, though we believe that pricing would be highly competitive and substantially lower than current third-party pricing. Developers can sign up for the same by heading here.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.