Skip to main content
OpenAI provides native speech-to-speech over WebRTC with built-in STT/TTS. No separate speech services required.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
OpenAI also provides a traditional LLM (Responses API and ChatCompletions) and standalone text-to-speech.

Installation

uv add "vision-agents[openai]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful voice assistant.",
    llm=openai.Realtime(model="gpt-realtime", voice="marin", fps=1),
)

Parameters

NameTypeDefaultDescription
modelstr"gpt-realtime"OpenAI realtime model
voicestr"marin"Voice (“marin”, “alloy”, “echo”, etc.)
fpsint1Video frames per second

Next Steps

OpenAI LLM

Responses API and ChatCompletions

Build a Voice Agent

Get started with voice