OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)
OpenAI’s API documentation now lists GPT-Realtime-2 (a more capable realtime voice model), plus gpt-realtime-translate for streaming speech-to-speech translation and gpt-realtime-whisper for low-latency transcription.
Editorial Team
The AI Coding Tools Directory editorial team researches and reviews AI-powered development tools to help developers find the best solutions for their workflows.
OpenAI announced new realtime voice models for developers:
- GPT-Realtime-2: OpenAI’s “most capable realtime voice model,” designed for speech-to-speech interactions with configurable reasoning effort. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-2)
- gpt-realtime-translate: a streaming speech-to-speech translation model that returns translated audio plus transcript deltas while source audio is still arriving. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-translate)
- gpt-realtime-whisper: a streaming speech-to-text model for low-latency transcript deltas from live audio. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
OpenAI’s developer announcement frames this as an upgrade to “voice agents,” highlighting GPT-Realtime-2 plus dedicated translation and transcription models. (OpenAI Developers on X: https://x.com/OpenAIDevs/status/2052440907933474954)
GPT-Realtime-2: higher capability for voice agents
OpenAI’s docs describe gpt-realtime-2 as a reasoning model for realtime voice interactions, supporting text + audio I/O and image input. (https://developers.openai.com/api/docs/models/gpt-realtime-2)
Notable specs shown in the docs:
- Context window: 128,000 tokens
- Max output tokens: 32,000 tokens
- Endpoints listed: includes
v1/realtimeplus the standard API endpoints.
(https://developers.openai.com/api/docs/models/gpt-realtime-2)
The docs also show token-based pricing for text/audio/image for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-2)
gpt-realtime-translate: streaming speech-to-speech translation
OpenAI’s docs describe gpt-realtime-translate as a streaming speech-to-speech translation model for live multilingual audio experiences. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
OpenAI says it uses a dedicated realtime translation endpoint and streams translated audio and transcript deltas while source audio arrives. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
The docs show per-minute audio pricing for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)
gpt-realtime-whisper: low-latency transcription
OpenAI’s docs describe gpt-realtime-whisper as a streaming speech-to-text model for realtime transcription, designed for low-latency transcript deltas from live audio. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
The docs show per-minute audio pricing and list the realtime transcription endpoint as v1/realtime/transcription_sessions. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)
Why this matters
If you’re building realtime voice experiences, OpenAI is signaling a clearer split between:
- a general-purpose voice agent model (gpt-realtime-2) and
- specialized “pipe” models for translation and transcription.
This can simplify production architectures where you need translation and captions alongside agent reasoning.
Sources
- OpenAI Developers announcement: https://x.com/OpenAIDevs/status/2052440907933474954
- OpenAI model docs (gpt-realtime-2): https://developers.openai.com/api/docs/models/gpt-realtime-2
- OpenAI model docs (gpt-realtime-translate): https://developers.openai.com/api/docs/models/gpt-realtime-translate
- OpenAI model docs (gpt-realtime-whisper): https://developers.openai.com/api/docs/models/gpt-realtime-whisper
Tools Mentioned in This Article
Free Resource
2026 AI Coding Tools Comparison Chart
Side-by-side comparison of features, pricing, and capabilities for every major AI coding tool.
No spam, unsubscribe anytime.
Frequently Asked Questions
What is OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)?
Related Articles
Cursor 3.9: One customization page for plugins, skills, MCPs, and a marketplace leaderboard
Cursor 3.9 unifies plugins, skills, MCPs, subagents, rules, commands, and hooks on a single customization page, adds custom MCP support, a marketplace leaderboard, plugin canvases, and more marketplace imports.
Read more →Product UpdatesClaude Fable 5 and Mythos 5: Anthropic launches a new frontier model with safeguards and trusted access
Anthropic announced Claude Fable 5 for broad availability and Claude Mythos 5 for a restricted trusted-access program, with safety fallbacks to Opus 4.8 and new rollout/pricing details.
Read more →Product UpdatesCursor 3.8: Automations get a /automate skill, Slack + GitHub triggers, and computer use
Cursor 3.8 expands Automations with a /automate skill for local agents, a Slack emoji trigger, five new GitHub triggers, computer use for cloud agents, and memory file management.
Read more →