OpenAI launches three new GPT-Realtime audio models for speech, translation, and transcription

OpenAI has rolled out a new set of real-time audio models focused on making voice AI faster and more useful in live conversations. The release includes three systems – GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper – built to support everything from live multilingual translation and instant speech-to-text transcription to full conversational voice agents that can reason, use tools, and execute actions during ongoing dialogue.

At the center of the release is GPT-Realtime-2, the company’s most advanced voice reasoning model to date. The Sam Altman-led firm describes it as a high-performance system capable of handling complex spoken interactions with reasoning power comparable to its latest large language models. Unlike traditional voice assistants that rely on step-by-step processing, this model is designed to operate in a continuous stream, allowing it to interpret speech as it happens and respond without noticeable delay. It supports a large context window of up to 32,000 tokens, allowing it to maintain long conversations without losing earlier context.

The model is also engineered for what OpenAI calls ‘agentic behaviour’, meaning it can perform actions during conversations. Through tool integration, GPT-Realtime-2 can interact with external systems like calendars, booking platforms, databases, and enterprise APIs. GPT-Realtime-2 is positioned as a premium enterprise product. Its pricing is set at around $32 per million audio input tokens and $64 per million output tokens, with reduced rates for cached inputs.

The second model, GPT-Realtime-Translate, focuses entirely on live speech translation. It is designed to process spoken input continuously and generate translations in real time without requiring speakers to pause or complete full sentences. OpenAI reports support for over 70 input languages and around 13 output languages. GPT-Realtime-Translate is priced on a usage-per-minute basis, at around $0.034 per minute of audio processing, making it significantly more accessible than the reasoning-heavy GPT-Realtime-2.

The third model, GPT-Realtime-Whisper, extends OpenAI’s earlier Whisper speech recognition technology into a real-time streaming system. Whisper was widely adopted in earlier AI applications for its strong multilingual transcription accuracy, and this new version is optimized for continuous speech-to-text conversion rather than post-recording analysis. It produces live transcriptions as speech is being spoken, allowing near-instant captions and documentation.

Such capability makes GPT-Realtime-Whisper particularly useful for live meetings, newsroom transcription, courtroom documentation, accessibility tools for hearing-impaired users, and enterprise logging systems. Meanwhile, the pricing for GPT-Realtime-Whisper is the lowest among the three models, at about $0.017 per minute of audio processing.

Technically, all three models show a shift away from the traditional voice AI architecture that relied on separate stages for speech recognition, language processing, and speech synthesis. A major focus in the new system is latency reduction. Another key advancement is the ability of the models to interact with external tools during conversation. Instead of functioning only as language generators, the models can actively retrieve data, perform operations, and trigger workflows in connected systems.

The Tech Portal is published by Blue Box Media Private Limited. Our investors have no influence over our reporting. Read our full Ownership and Funding Disclosure →

Ashutosh Singh

Ashutosh is a Senior Writer at The Tech Portal, largely reporting on new tech, and intersection of technology and business. Ashutosh’s career spans across nearly a decade of technology writing across multiple platforms and languages.

OpenAI launches three new GPT-Realtime audio models for speech, translation, and transcription

Up next

Zepto secures SEBI approval for a potential $1.2Bn IPO: Report

Author

Ashutosh Singh

Tags

Meta tests ‘WhatsApp Plus’ premium subscription with new customization features

Nvidia invests in Mira Murati’s Thinking Machines Lab, plans 1GW Vera Rubin AI compute deployment

Meta cuts hundreds of jobs again across Reality Labs, social media and recruiting teams: Report

Microsoft rolls out multi-model AI feature ‘Critique’ along with ‘Model Council’ tool to expand Copilot capabilities

Netflix acquires Ben Affleck’s AI filmmaking venture

Meta plans first paid ‘Meta AI subscription’ tiers under new ‘Meta One’ monetization push: Report

AI coding startup Cognition secures $1Bn in funding at $26Bn valuation

Elon Musk’s XChat opens Android pre-registration on Google Play Store

SK Hynix reaches $1Tn club after Samsung and Micron as demand for HBM AI memory chips surges