> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AssemblyAI

[AssemblyAI](https://www.assemblyai.com) provides real-time streaming speech-to-text with built-in punctuation-based turn detection and sub-300ms latency.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

## Installation

```sh theme={null}
uv add "vision-agents[assemblyai]"
```

## Quick start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import assemblyai, cartesia, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM(),
    stt=assemblyai.STT(),
    tts=cartesia.TTS(),
)
```

<Warning>
  Set `ASSEMBLYAI_API_KEY` in your environment or pass `api_key` directly.
</Warning>

## STT

Real-time transcription using AssemblyAI's Universal-3 Pro model with built-in turn detection.

```python theme={null}
stt = assemblyai.STT(
    speech_model="u3-rt-pro",
    sample_rate=16000,
)
```

### With keyterms boosting

Boost recognition accuracy for specific terms:

```python theme={null}
stt = assemblyai.STT(
    keyterms_prompt=["AssemblyAI", "Vision Agents"],
)
```

### With custom turn silence thresholds

Configure turn detection timing:

```python theme={null}
stt = assemblyai.STT(
    min_turn_silence=100,   # ms before speculative EOT check
    max_turn_silence=1200,  # ms before forcing turn end
)
```

## Parameters

| Name                          | Type        | Default       | Description                                                               |
| ----------------------------- | ----------- | ------------- | ------------------------------------------------------------------------- |
| `api_key`                     | `str`       | `None`        | API key (defaults to `ASSEMBLYAI_API_KEY` env var)                        |
| `speech_model`                | `str`       | `"u3-rt-pro"` | Model identifier                                                          |
| `sample_rate`                 | `int`       | `16000`       | Audio sample rate in Hz                                                   |
| `min_turn_silence`            | `int`       | API default   | Silence (ms) before speculative end-of-turn check                         |
| `max_turn_silence`            | `int`       | API default   | Maximum silence (ms) before forcing turn end                              |
| `prompt`                      | `str`       | `None`        | Custom transcription prompt (cannot be combined with `keyterms_prompt`)   |
| `keyterms_prompt`             | `list[str]` | `None`        | List of terms to boost recognition for (cannot be combined with `prompt`) |
| `max_reconnect_attempts`      | `int`       | `3`           | Maximum reconnect attempts on transient failures                          |
| `reconnect_backoff_initial_s` | `float`     | `0.5`         | Initial backoff delay in seconds                                          |
| `reconnect_backoff_max_s`     | `float`     | `4.0`         | Maximum backoff delay in seconds                                          |

## Next steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
