ElevenLabs

ElevenLabs is a voice AI platform that offers advanced Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities with highly realistic and expressive voices.
It supports multiple languages and voices, making it ideal for real-time conversational agents, narrated content, accessibility tools, and voice-enabled applications. The ElevenLabs plugin for the Stream Python AI SDK allows you to add both TTS and STT functionality to your project.

Installation

Install the Stream ElevenLabs plugin with

uv add vision-agents[elevenlabs]

Example

Check out our Elevenlabs example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.

Text-to-Speech (TTS)

Initialisation

The ElevenLabs TTS plugin exists in the form of the TTS class:

from vision_agents.plugins import elevenlabs

tts = elevenlabs.TTS()

To initialise without passing in the API key, make sure the ELEVENLABS_API_KEY is available as an environment variable. You can do this either by defining it in a .env file or exporting it directly in your terminal.

Parameters

These are the parameters available in the ElevenLabs TTS plugin for you to customise:

Name	Type	Default	Description
`api_key`	`str` or `None`	`None`	Your ElevenLabs API key. If not provided, the plugin will look for the `ELEVENLABS_API_KEY` environment variable.
`voice_id`	`str`	`"VR6AewLTigWG4xSOukaG"`	The ID of the voice to use for TTS. You can use any voice from your ElevenLabs account.
`model_id`	`str`	`"eleven_multilingual_v2"`	The ID of the ElevenLabs TTS model to use. Controls the language and tone model for synthesis.

Functionality

Send text to convert to speech

The send() method sends the text passed in for the service to synthesize. The resulting audio is then played through the configured output track.

tts.send("Demo text you want AI voice to say")

Speech-to-Text (STT)

ElevenLabs provides real-time speech-to-text capabilities through their Scribe v2 model, which offers low latency (~150ms) transcription with support for 99 languages.

Initialisation

The ElevenLabs STT plugin uses the STT class:

from vision_agents.plugins import elevenlabs

stt = elevenlabs.STT()

Parameters

These are the parameters available in the ElevenLabs STT plugin for you to customise:

Name	Type	Default	Description
`api_key`	`str` or `None`	`None`	Your ElevenLabs API key. If not provided, the plugin will look for the `ELEVENLABS_API_KEY` environment variable.
`model_id`	`str`	`"scribe_v2_realtime"`	The model to use for transcription. Defaults to Scribe v2 realtime model.
`language_code`	`str`	`"en"`	Language code for transcription (e.g., “en”, “es”, “fr”). Supports 99 languages.
`vad_silence_threshold_secs`	`float`	`1.5`	VAD silence threshold in seconds before committing a transcript.
`vad_threshold`	`float`	`0.4`	VAD threshold for speech detection (0.0-1.0).
`min_speech_duration_ms`	`int`	`100`	Minimum speech duration in milliseconds to trigger transcription.
`min_silence_duration_ms`	`int`	`100`	Minimum silence duration in milliseconds to detect speech boundaries.
`audio_chunk_duration_ms`	`int`	`100`	Duration of audio chunks to send (100-1000ms recommended).
`client`	`AsyncElevenLabs` or `None`	`None`	Optional pre-configured AsyncElevenLabs client instance.

Features

Real-time transcription: Low latency (~150ms) speech recognition
Multi-language support: 99 languages supported
VAD-based commit strategy: Automatic transcript segmentation based on voice activity detection
Automatic reconnection: Built-in exponential backoff for connection failures
Audio resampling: Automatically resamples audio to 16kHz mono for optimal quality

The Scribe v2 model does not support turn detection. The turn_detection property is set to False for this implementation.

Overview

AI Providers

Custom Integrations

Installation

Example

Text-to-Speech (TTS)

Initialisation

Parameters

Functionality

Send text to convert to speech

Speech-to-Text (STT)

Initialisation

Parameters

Features

Overview

AI Providers

Custom Integrations

​Installation

​Example

​Text-to-Speech (TTS)

​Initialisation

​Parameters

​Functionality

​Send text to convert to speech

​Speech-to-Text (STT)

​Initialisation

​Parameters

​Features

Installation

Example

Text-to-Speech (TTS)

Initialisation

Parameters

Functionality

Send text to convert to speech

Speech-to-Text (STT)

Initialisation

Parameters

Features