ElevenLabs TTS - Vision Agents

ElevenLabs provides highly realistic and expressive text-to-speech voices. Supports multiple languages and voice styles.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

ElevenLabs also provides real-time speech-to-text via Scribe with built-in turn detection. You can use both in the same agent.

Installation

uv add "vision-agents[elevenlabs]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import elevenlabs, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)

Set ELEVENLABS_API_KEY in your environment or pass api_key directly.

Parameters

tts = elevenlabs.TTS(
    voice_id="VR6AewLTigWG4xSOukaG",
    model_id="eleven_multilingual_v2",
)

Name	Type	Default	Description
`voice_id`	`str`	`"VR6AewLTigWG4xSOukaG"`	ElevenLabs voice ID
`model_id`	`str`	`"eleven_multilingual_v2"`	TTS model
`api_key`	`str`	`None`	API key (defaults to `ELEVENLABS_API_KEY` env var)

Next Steps

ElevenLabs STT

Real-time speech-to-text via Scribe

Build a Voice Agent

Get started with voice

Deepgram TTS Fish Audio TTS

​Installation

​Quick Start

​Parameters

​Next Steps

ElevenLabs STT

Build a Voice Agent

Installation

Quick Start

Parameters

Next Steps