Barely hours ahead of Google’s annual I/O event, OpenAI has now lifted the veil off its newest AI model – GPT-4o. OpenAI’s Chief Technology Officer, Mira Murati, announced the launch of GPT-4o during a livestream event, revealing that it offers free access to powerful AI capabilities for a wider audience.

“In line with our mission, we are focused on advancing AI technology and ensuring it is accessible and beneficial to everyone. Today we are introducing our newest model, GPT-4o, and will be rolling out more intelligence and advanced tools to ChatGPT for free,” OpenAI announced in a blog post. “GPT-4o is our newest flagship model that provides GPT-4-level intelligence but is much faster and improves on its capabilities across text, voice, and vision. Today, GPT-4o is much better than any existing model at understanding and discussing the images you share.” The “o” in GPT-4o stands for “omni,” according to OpenAI, and can respond to audio inputs in as little as 232 milliseconds with an average of 320 milliseconds.

Unlike previous models focused on a single modality like text, GPT-4o is a multimodal LLM. This means it can understand and respond to user input through various channels, including text prompts, voice commands, and even real-time video captured through smartphones. This capability allows for a more natural and intuitive user experience. Users can interact with GPT-4o by asking questions through voice commands while providing relevant visual information, such as an image captured on their phone. GPT-4o can process all these inputs concurrently. Furthermore, the new model incorporates advanced object and image recognition capabilities, allowing for real-time analysis of visual information. This flexibility mimics natural human conversation, and GPT-4o can combine combination of text, audio, and images in its output. This opens doors for various applications, such as providing real-time assistance during tasks, and OpenAI notes that GPT-4o matches GPT-4 Turbo performance on text in English and code.

At the moment, OpenAI is making GPT-4o available through a free tier within their popular ChatGPT chatbot, while paid subscribers of ChatGPT Plus and Team plans will receive additional benefits, including a five-fold increase in capacity limits. This grants access to a wider range of GPT-4o’s functionalities, potentially catering to more demanding tasks or workflows. OpenAI envisions GPT-4o as the next step towards a future where human-machine interaction is more natural and collaborative. The model’s ability to understand and respond in real-time using multiple modalities paves the way for a more intuitive user experience. Users can interact with GPT-4o as if they are having a conversation with a knowledgeable assistant, seamlessly switching between text, voice, and even visual cues.

GPT-4o also eliminates lag and responds in real-time. And if this is not enough, GPT-4o’s multilingual capabilities have been significantly enhanced, offering improved performance in over 50 languages. In the coming weeks, OpenAI will be rolling out a new Voice Mode with expanded functionalities, which will enable users to continue the conversation with GPT-4o even if they interrupt it mid-response. OpenAI notes that early access will be provided for Plus users during a broader rollout. For example, users will be able to have more natural, real-time voice conversations with the chatbot, as well as have the ability to converse with ChatGPT via real-time video.