Polly is Amazon’s latest cloud service available on the AWS. It is capable of converting text to lifelike speech that can then be deployed by developers into their own tools and applications for their own usage. The engine was announced at Amazon’s re:Invent AWS event which recently took place in LA.
Like Rekognition, Polly is not your average, run-of-the-mill AI either and its capabilities far exceed what you would normally expect from a text to speech engine.
Polly was designed to address many of the more challenging aspects of speech generation. For example, consider the difference in pronunciation of the word “live” in the phrases “I live in Seattle” and “Live from New York.” Polly knows that this pair of homographs are spelled the same but are pronounced quite differently. Or, what about the “St.” Depending on the language and the context, this could mean (and should be pronounced) as either “street” or “saint.” Again, Polly knows what to do here.
Polly can also take care of complications like units, fractions, abbreviations, currencies, dates, times and so on — which would be enough to leave any normal speech engine stumped. The tool currently supports as many as 47 male & female voices over 24 languages, and Amazon is planning to build with additional languages and voices soon as well.
The engine also encrypts all data at rest and transfers the audio across SSL connections, ensuring security and stability. Meanwhile, all these text submissions are disassociated from the submitter, stored in encrypted form for up to 6 months, and contributes towards maintaining and improving Polly.
Amazon has actually been pretty generous with the pricing scheme it has deployed for Polly. Everyone has been given 5 million characters per month at no charge. So, you can get it to convert up to 5 million characters of text into speech for free every month. After that, its $0.000004 per character — adding up to only a couple of dollars for a normal sized novel.