Back in May, OpenAI encountered hot waters after one of its ChatGPT voices, known as Sky, showed a resemblance to actress Scarlett Johansson’s character in the science fiction film “Her.” Now, a few months down the line, the AI firm has initiated the alpha rollout of its much-anticipated Advanced Voice Mode for ChatGPT, providing a select group of ChatGPT Plus users with early access to this feature. The feature will gradually become available to all ChatGPT Plus users by fall 2024, according to reports.
So far, traditional text-based interactions with ChatGPT, while effective, lack the immediacy and nuance of spoken communication. The new voice mode allows users to converse with ChatGPT as if they were speaking with another person, making the interaction more intuitive and accessible. OpenAI just has to take care to avoid deepfake controversies – their situation was not helped by the whole “Her” debacle , which later led to a delay in the rollout of the Advanced Voice Mode back in May.
The Advanced Voice Mode, when it was first unveiled, had drawn widespread attention for its quick and realistic responses, which closely mimicked human speech. Unlike the existing voice mode in ChatGPT, the Advanced Voice Mode utilizes the multimodal capabilities of GPT-4o, enabling it to process voice-to-text, text-to-speech, and text processing tasks within a single model. This integration results in significantly lower latency and more natural, real-time conversations. Furthermore, GPT-4o can detect and respond to emotional intonations in users’ voices, such as excitement, sadness, or even singing.
For now, OpenAI is adopting a phased approach to the rollout of Advanced Voice Mode to ensure close monitoring and continuous improvement based on real-world feedback. Initially, a small group of ChatGPT Plus users will receive access, with notifications sent via the ChatGPT app and email instructions. “We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions,” OpenAI revealed in a post on X.
We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.twitter.com/64O94EhhXK
— OpenAI (@OpenAI) July 30, 2024
And if this is not enough, OpenAI has introduced four new preset voices – Juniper, Breeze, Cove, and Ember – developed in collaboration with paid voice actors. These voices ensure that the AI cannot impersonate individuals or public figures, thus preventing misuse. OpenAI spokesperson Lindsay McCallum revealed that ChatGPT’s voice mode is designed to block outputs that differ from these preset voices. Furthermore, OpenAI has implemented new filters to prevent the generation of copyrighted audio, such as music, to avoid potential legal issues. This measure is particularly relevant given recent legal challenges faced by AI companies over copyright infringement.
OpenAI has also conducted testing of GPT-4o’s voice capabilities, involving over 100 external red teamers who collectively speak 45 different languages. This testing was aimed at identifying and addressing potential weaknesses, ensuring the robustness of the model. OpenAI plans to release a detailed report on these safety efforts in early August. Beyond the current voice capabilities, OpenAI plans to introduce additional features such as video and screen-sharing at a later date.