OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)
OpenAI’s API documentation now lists GPT-Realtime-2 (a more capable realtime voice model), plus gpt-realtime-translate for streaming speech-to-speech translation and gpt-realtime-whisper for low-latency transcription.
Editorial Team
The AI Coding Tools Directory editorial team researches and reviews AI-powered development tools to help developers find the best solutions for their workflows.
OpenAI announced new realtime voice models for developers:
- GPT-Realtime-2: OpenAI’s “most capable realtime voice model,” designed for speech-to-speech interactions with configurable reasoning effort. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-2)
- gpt-realtime-translate: a streaming speech-to-speech translation model that returns translated audio plus transcript deltas while source audio is still arriving. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-translate)
- gpt-realtime-whisper: a streaming speech-to-text model for low-latency transcript deltas from live audio. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
OpenAI’s developer announcement frames this as an upgrade to “voice agents,” highlighting GPT-Realtime-2 plus dedicated translation and transcription models. (OpenAI Developers on X: https://x.com/OpenAIDevs/status/2052440907933474954)
GPT-Realtime-2: higher capability for voice agents
OpenAI’s docs describe gpt-realtime-2 as a reasoning model for realtime voice interactions, supporting text + audio I/O and image input. (https://developers.openai.com/api/docs/models/gpt-realtime-2)
Notable specs shown in the docs:
- Context window: 128,000 tokens
- Max output tokens: 32,000 tokens
- Endpoints listed: includes
v1/realtimeplus the standard API endpoints.
(https://developers.openai.com/api/docs/models/gpt-realtime-2)
The docs also show token-based pricing for text/audio/image for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-2)
gpt-realtime-translate: streaming speech-to-speech translation
OpenAI’s docs describe gpt-realtime-translate as a streaming speech-to-speech translation model for live multilingual audio experiences. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
OpenAI says it uses a dedicated realtime translation endpoint and streams translated audio and transcript deltas while source audio arrives. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
The docs show per-minute audio pricing for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
gpt-realtime-whisper: low-latency transcription
OpenAI’s docs describe gpt-realtime-whisper as a streaming speech-to-text model for realtime transcription, designed for low-latency transcript deltas from live audio. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
The docs show per-minute audio pricing and list the realtime transcription endpoint as v1/realtime/transcription_sessions. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
Why this matters
If you’re building realtime voice experiences, OpenAI is signaling a clearer split between:
- a general-purpose voice agent model (gpt-realtime-2) and
- specialized “pipe” models for translation and transcription.
This can simplify production architectures where you need translation and captions alongside agent reasoning.
Sources
- OpenAI Developers announcement: https://x.com/OpenAIDevs/status/2052440907933474954
- OpenAI model docs (gpt-realtime-2): https://developers.openai.com/api/docs/models/gpt-realtime-2
- OpenAI model docs (gpt-realtime-translate): https://developers.openai.com/api/docs/models/gpt-realtime-translate
- OpenAI model docs (gpt-realtime-whisper): https://developers.openai.com/api/docs/models/gpt-realtime-whisper
Tools Mentioned in This Article
Free Resource
2026 AI Coding Tools Comparison Chart
Side-by-side comparison of features, pricing, and capabilities for every major AI coding tool.
No spam, unsubscribe anytime.
Frequently Asked Questions
What is OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)?
Related Articles
Cursor (May 2026) update: Microsoft Teams integration + customizable Bugbot PR review effort
Cursor’s changelog highlights two May 2026 updates: a new Microsoft Teams integration (mention @Cursor in channels) and new settings to tune Bugbot PR review effort (Default / High / Custom).
Read more →Product UpdatesClaude Code 2.1.139 (May 2026): new agent view + /goal command
Claude Code 2.1.139 adds an agent view (claude agents) and a new /goal command, plus a long list of MCP, plugin, hooks, and reliability improvements.
Read more →Product UpdatesWarp (May 2026) update: better agent workflows, image + Mermaid rendering, and a Codex harness option
Warp's 2026.05.06 changelog adds new agent-workflow features like a local conversation details panel, drag-and-drop images into CLI agent sessions, inline image/Mermaid rendering, and a Codex harness option for local child agents.
Read more →