ElevenLabs provides highly realistic and expressive text-to-speech voices. Supports multiple languages and voice styles.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
ElevenLabs also provides real-time speech-to-text via Scribe with built-in turn detection. You can use both in the same agent.
Installation
uv add "vision-agents[elevenlabs]"
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import elevenlabs, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-3-flash-preview"),
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
)
Set ELEVENLABS_API_KEY in your environment or pass api_key directly.
Parameters
tts = elevenlabs.TTS(
voice_id="VR6AewLTigWG4xSOukaG",
model_id="eleven_multilingual_v2",
)
| Name | Type | Default | Description |
|---|
voice_id | str | "VR6AewLTigWG4xSOukaG" | ElevenLabs voice ID |
model_id | str | "eleven_multilingual_v2" | TTS model |
api_key | str | None | API key (defaults to ELEVENLABS_API_KEY env var) |
Next Steps
ElevenLabs STT
Real-time speech-to-text via Scribe
Build a Voice Agent
Get started with voice