OpenAI Realtime - Vision Agents

OpenAI provides native speech-to-speech over WebRTC with built-in STT/TTS. No separate speech services required.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

OpenAI also provides a traditional LLM (Responses API and ChatCompletions) and standalone text-to-speech.

Installation

uv add "vision-agents[openai]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful voice assistant.",
    llm=openai.Realtime(model="gpt-realtime", voice="marin", fps=1),
)

Parameters

Name	Type	Default	Description
`model`	`str`	`"gpt-realtime-2"`	OpenAI realtime model
`voice`	`str`	`"marin"`	Voice (“marin”, “alloy”, “echo”, etc.)
`fps`	`int`	`1`	Video frames per second

Next Steps

OpenAI LLM

Responses API and ChatCompletions

Build a Voice Agent

Get started with voice

Inworld Realtime Qwen Realtime

​Installation

​Quick Start

​Parameters

​Next Steps

OpenAI LLM

Build a Voice Agent

Installation

Quick Start

Parameters

Next Steps