Product Updates

OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)

OpenAI’s API documentation now lists GPT-Realtime-2 (a more capable realtime voice model), plus gpt-realtime-translate for streaming speech-to-speech translation and gpt-realtime-whisper for low-latency transcription.

By AI Coding Tools Directory•2026-05-12•10 min read

Last reviewed: 2026-05-12

ACTD

AI Coding Tools Directory

Editorial Team

The AI Coding Tools Directory editorial team researches and reviews AI-powered development tools to help developers find the best solutions for their workflows.

OpenAI announced new realtime voice models for developers:

GPT-Realtime-2: OpenAI’s “most capable realtime voice model,” designed for speech-to-speech interactions with configurable reasoning effort. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-2)
gpt-realtime-translate: a streaming speech-to-speech translation model that returns translated audio plus transcript deltas while source audio is still arriving. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-translate)
gpt-realtime-whisper: a streaming speech-to-text model for low-latency transcript deltas from live audio. (OpenAI model docs: https://developers.openai.com/api/docs/models/gpt-realtime-whisper)

OpenAI’s developer announcement frames this as an upgrade to “voice agents,” highlighting GPT-Realtime-2 plus dedicated translation and transcription models. (OpenAI Developers on X: https://x.com/OpenAIDevs/status/2052440907933474954)

GPT-Realtime-2: higher capability for voice agents

OpenAI’s docs describe gpt-realtime-2 as a reasoning model for realtime voice interactions, supporting text + audio I/O and image input. (https://developers.openai.com/api/docs/models/gpt-realtime-2)

Notable specs shown in the docs:

Context window: 128,000 tokens
Max output tokens: 32,000 tokens
Endpoints listed: includes v1/realtime plus the standard API endpoints.

(https://developers.openai.com/api/docs/models/gpt-realtime-2)

The docs also show token-based pricing for text/audio/image for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-2)

gpt-realtime-translate: streaming speech-to-speech translation

OpenAI’s docs describe gpt-realtime-translate as a streaming speech-to-speech translation model for live multilingual audio experiences. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

OpenAI says it uses a dedicated realtime translation endpoint and streams translated audio and transcript deltas while source audio arrives. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

The docs show per-minute audio pricing for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

gpt-realtime-whisper: low-latency transcription

OpenAI’s docs describe gpt-realtime-whisper as a streaming speech-to-text model for realtime transcription, designed for low-latency transcript deltas from live audio. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)

The docs show per-minute audio pricing and list the realtime transcription endpoint as v1/realtime/transcription_sessions. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)

Why this matters

If you’re building realtime voice experiences, OpenAI is signaling a clearer split between:

a general-purpose voice agent model (gpt-realtime-2) and
specialized “pipe” models for translation and transcription.

This can simplify production architectures where you need translation and captions alongside agent reasoning.

Sources

OpenAI Developers announcement: https://x.com/OpenAIDevs/status/2052440907933474954
OpenAI model docs (gpt-realtime-2): https://developers.openai.com/api/docs/models/gpt-realtime-2
OpenAI model docs (gpt-realtime-translate): https://developers.openai.com/api/docs/models/gpt-realtime-translate
OpenAI model docs (gpt-realtime-whisper): https://developers.openai.com/api/docs/models/gpt-realtime-whisper

Tools Mentioned in This Article

Zed

High-performance Rust code editor with agentic AI and open-source edit prediction

Freemium

→

Free Resource

2026 AI Coding Tools Comparison Chart

Side-by-side comparison of features, pricing, and capabilities for every major AI coding tool.

No spam, unsubscribe anytime.

Frequently Asked Questions

What is OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)?