OpenAI provides native speech-to-speech over WebRTC with built-in STT/TTS. No separate speech services required.Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
Quick Start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "gpt-realtime-2" | OpenAI realtime model |
voice | str | "marin" | Voice (“marin”, “alloy”, “echo”, etc.) |
fps | int | 1 | Video frames per second |
Next Steps
OpenAI LLM
Responses API and ChatCompletions
Build a Voice Agent
Get started with voice

