> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ElevenLabs STT

[ElevenLabs](https://www.elevenlabs.io) provides real-time speech-to-text via Scribe v2 with \~150ms latency, 99 languages, and built-in VAD-based turn detection. No separate turn detection plugin is needed.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

<Tip>
  ElevenLabs also provides highly realistic [text-to-speech](/integrations/tts/elevenlabs). You can use both in the same agent.
</Tip>

## Installation

```sh theme={null}
uv add "vision-agents[elevenlabs]"
```

## Quick Start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import elevenlabs, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=elevenlabs.STT(),
    tts=elevenlabs.TTS(),
)
```

<Warning>
  Set `ELEVENLABS_API_KEY` in your environment or pass `api_key` directly.
</Warning>

## Parameters

```python theme={null}
stt = elevenlabs.STT(
    model_id="scribe_v2_realtime",
    language_code="en",
)
```

| Name                         | Type    | Default                | Description                                        |
| ---------------------------- | ------- | ---------------------- | -------------------------------------------------- |
| `model_id`                   | `str`   | `"scribe_v2_realtime"` | Scribe model                                       |
| `language_code`              | `str`   | `"en"`                 | Language code                                      |
| `api_key`                    | `str`   | `None`                 | API key (defaults to `ELEVENLABS_API_KEY` env var) |
| `vad_silence_threshold_secs` | `float` | `0.3`                  | Silence duration (seconds) before VAD commits      |
| `vad_threshold`              | `float` | `0.4`                  | VAD sensitivity threshold for speech detection     |
| `min_speech_duration_ms`     | `int`   | `100`                  | Minimum speech duration in milliseconds            |
| `min_silence_duration_ms`    | `int`   | `100`                  | Minimum silence duration in milliseconds           |
| `audio_chunk_duration_ms`    | `int`   | `100`                  | Audio chunk size sent to the server (100-1000ms)   |

<Tip>
  ElevenLabs STT includes built-in turn detection via VAD. When you use `elevenlabs.STT`, the `Agent` automatically ignores any external `TurnDetector` plugin to prevent conflicts. You do not need to add a separate turn detection plugin.
</Tip>

## Next Steps

<CardGroup cols={2}>
  <Card title="ElevenLabs TTS" icon="volume-high" href="/integrations/tts/elevenlabs">
    Expressive text-to-speech
  </Card>

  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>
</CardGroup>
