Google has unveiled Gemini Live, a cutting-edge voice interaction feature powered by its Gemini generative AI, as part of its ongoing efforts to enhance voice assistants. Announced at the Made by Google 2024 event, this feature aims to rival OpenAI’s Advanced Voice Mode for ChatGPT, which was launched in limited alpha earlier this year. Gemini Live offers users an immersive voice interaction experience, bringing advanced AI capabilities directly to their smartphones.
Gemini Live allows users to engage in “in-depth” voice conversations with Google’s AI-powered chatbot, Gemini, on their smartphones. Thanks to an upgraded speech engine, Gemini Live delivers more emotionally expressive and realistic multi-turn dialogues. Users can interrupt the chatbot while it’s speaking to ask follow-up questions, and the AI will adapt to their speech patterns in real time, creating a more natural conversational experience.
Google describes Gemini Live in a recent blog post, highlighting that users can choose from ten natural-sounding voices that the AI can respond with, making the interaction feel more personalized. Additionally, the feature is hands-free, allowing users to continue their conversation with Gemini even when their phone is locked or when the app is running in the background. Conversations can also be paused and resumed at any time, offering flexibility in how users interact with the AI.
One of the key advantages of Gemini Live over its competitors, like ChatGPT’s Advanced Voice Mode, is its improved memory. The underlying generative AI model, based on Gemini 1.5 Pro and Gemini 1.5 Flash, features a longer-than-average “context window,” enabling it to process and reason over a significant amount of data. This means Gemini Live can handle extended conversations spanning hours, maintaining context and coherence throughout the interaction.
According to Google, Gemini Live’s architecture is specifically adapted for more conversational use, making it particularly effective in scenarios where lengthy, back-and-forth dialogue is necessary. However, the real-world performance of these features remains to be seen, as demos often differ from actual user experiences.
Despite its advanced capabilities, Gemini Live currently lacks one of the features that Google showcased at its I/O conference: multimodal input. Initially demonstrated with the ability to use the camera for contextual understanding — such as identifying parts of a broken bicycle or interpreting code on a computer screen — this feature has yet to be rolled out. Google has promised that multimodal input will be available later this year, along with support for additional languages and an iOS version of Gemini Live. For now, the feature is only available in English and on Android devices.
Users will have to pay to gain access to Gemini Live, though, and the company noted that it is exclusive to users subscribed to Gemini Advanced, part of the Google One AI Premium Plan, which costs $20 per month. However, Google is offering Pixel Pro 9 users free access to this plan for the first year, including all the benefits of Gemini Advanced and Gemini Live.
In addition to Gemini Live, Google is introducing other Gemini-enabled features. For example, Android users will soon be able to bring up Gemini’s overlay on top of any app to ask questions about what’s on the screen. This overlay can also generate images, which can be integrated into other apps like Gmail and Google Messages, although it currently lacks the ability to create images of people.