ChatGPT-maker OpenAI has now introduced a suite of new voice intelligence models in its API which is designed to make AI-powered voice interactions more natural, responsive and also capable of handling complex tasks. The lineup includes GPT‑Realtime‑2, a live voice model with GPT‑5‑class reasoning; GPT‑Realtime‑Translate, which enables real-time multilingual conversations; and GPT‑Realtime‑Whisper, a streaming speech-to-text model for instant transcription.
“Voice is becoming one of the most natural ways for people to use software. A voice agent needs to understand what someone means, keep track of context, recover when a request changes, use tools while the conversation continues, and respond in a way that feels appropriate to the moment. Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” said the company.
Key features of the new models
The new models expand voice AI beyond simple call-and-response.
- Voice-to-action: Users can describe tasks, and the system reasons through requests to complete them.
- Systems-to-voice: Software can proactively provide spoken guidance, such as travel apps updating passengers about delays.
- Voice-to-voice: Real-time translation allows seamless multilingual conversations, maintaining context across languages.
These features are already being tested by companies like Zillow, Deutsche Telekom, and Vimeo, highlighting their potential in customer support and global communication.
OpenAI reports that GPT‑Realtime‑2 delivers stronger reasoning, scoring 15.2% higher on Big Bench Audio and 13.8% higher on Audio MultiChallenge benchmarks compared to earlier versions. The model supports a 128K context window, enabling longer, coherent conversations. Developers can also adjust reasoning effort levels to balance latency and complexity, while enhanced tone control allows empathetic, calm, or upbeat delivery depending on context.
To ensure responsible use, OpenAI has integrated active classifiers that detect harmful content and halt sessions when necessary. Developers can add custom guardrails via the Agents SDK. The API also supports EU data residency and enterprise-grade privacy commitments, making it suitable for regulated industries such as finance and healthcare.
Pricing and availability
The new models are available now in the Realtime API.
- GPT‑Realtime‑2: $32 per 1M audio input tokens, $64 per 1M audio output tokens.
- GPT‑Realtime‑Translate: $0.034 per minute.
- GPT‑Realtime‑Whisper: $0.017 per minute.