Product Updates

OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)

OpenAI’s API documentation now lists GPT-Realtime-2 (a more capable realtime voice model), plus gpt-realtime-translate for streaming speech-to-speech translation and gpt-realtime-whisper for low-latency transcription.

By AI Coding Tools Directory2026-05-1210 min read
Last reviewed: 2026-05-12
ACTD
AI Coding Tools Directory

Editorial Team

The AI Coding Tools Directory editorial team researches and reviews AI-powered development tools to help developers find the best solutions for their workflows.

OpenAI announced new realtime voice models for developers:

OpenAI’s developer announcement frames this as an upgrade to “voice agents,” highlighting GPT-Realtime-2 plus dedicated translation and transcription models. (OpenAI Developers on X: https://x.com/OpenAIDevs/status/2052440907933474954)

GPT-Realtime-2: higher capability for voice agents

OpenAI’s docs describe gpt-realtime-2 as a reasoning model for realtime voice interactions, supporting text + audio I/O and image input. (https://developers.openai.com/api/docs/models/gpt-realtime-2)

Notable specs shown in the docs:

  • Context window: 128,000 tokens
  • Max output tokens: 32,000 tokens
  • Endpoints listed: includes v1/realtime plus the standard API endpoints.

(https://developers.openai.com/api/docs/models/gpt-realtime-2)

The docs also show token-based pricing for text/audio/image for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-2)

gpt-realtime-translate: streaming speech-to-speech translation

OpenAI’s docs describe gpt-realtime-translate as a streaming speech-to-speech translation model for live multilingual audio experiences. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

OpenAI says it uses a dedicated realtime translation endpoint and streams translated audio and transcript deltas while source audio arrives. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

The docs show per-minute audio pricing for this model. (https://developers.openai.com/api/docs/models/gpt-realtime-translate)

gpt-realtime-whisper: low-latency transcription

OpenAI’s docs describe gpt-realtime-whisper as a streaming speech-to-text model for realtime transcription, designed for low-latency transcript deltas from live audio. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)

The docs show per-minute audio pricing and list the realtime transcription endpoint as v1/realtime/transcription_sessions. (https://developers.openai.com/api/docs/models/gpt-realtime-whisper)

Why this matters

If you’re building realtime voice experiences, OpenAI is signaling a clearer split between:

  • a general-purpose voice agent model (gpt-realtime-2) and
  • specialized “pipe” models for translation and transcription.

This can simplify production architectures where you need translation and captions alongside agent reasoning.

Sources

Free Resource

2026 AI Coding Tools Comparison Chart

Side-by-side comparison of features, pricing, and capabilities for every major AI coding tool.

No spam, unsubscribe anytime.

Frequently Asked Questions

What is OpenAI adds GPT-Realtime-2 + realtime translation + realtime transcription models to the API (May 2026)?
OpenAI’s API documentation now lists GPT-Realtime-2 (a more capable realtime voice model), plus gpt-realtime-translate for streaming speech-to-speech translation and gpt-realtime-whisper for low-latency transcription.