> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# xAI TTS

> Text-to-speech using xAI's Grok voices with speech tag support.

[xAI](https://x.ai/) provides text-to-speech with five expressive voices, inline speech tags for delivery control, and multiple output codecs.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

<Tip>
  xAI also provides an [LLM](/integrations/llm/xai) and [Realtime speech-to-speech](/integrations/realtime/xai). You can use all three in the same agent.
</Tip>

## Installation

```sh theme={null}
uv add "vision-agents[xai]"
```

## Quick start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import xai, getstream, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=xai.LLM(model="grok-4.1"),
    stt=deepgram.STT(),
    tts=xai.TTS(),
)
```

<Warning>
  Set `XAI_API_KEY` in your environment or pass `api_key` directly.
</Warning>

## Parameters

```python theme={null}
tts = xai.TTS(voice="eve", language="en", codec="pcm", sample_rate=24000)
```

| Name          | Type  | Default | Description                                                           |
| ------------- | ----- | ------- | --------------------------------------------------------------------- |
| `api_key`     | `str` | `None`  | API key (defaults to `XAI_API_KEY` env var)                           |
| `voice`       | `str` | `"eve"` | Voice (`"eve"`, `"ara"`, `"leo"`, `"rex"`, `"sal"`)                   |
| `language`    | `str` | `"en"`  | BCP-47 language code (e.g. `"en"`, `"zh"`, `"pt-BR"`) or `"auto"`     |
| `codec`       | `str` | `"pcm"` | Output codec (`"pcm"`, `"wav"`, `"mp3"`, `"mulaw"`, `"alaw"`)         |
| `sample_rate` | `int` | `24000` | Output sample rate in Hz (8000, 16000, 22050, 24000, 44100, or 48000) |
| `bit_rate`    | `int` | `None`  | MP3 bit rate (only used when codec is `"mp3"`)                        |

### Voices

| Voice | Description                                                         |
| ----- | ------------------------------------------------------------------- |
| `eve` | Energetic, upbeat — engaging and enthusiastic (default)             |
| `ara` | Warm, friendly — balanced and conversational                        |
| `leo` | Authoritative, strong — commanding, great for instructional content |
| `rex` | Confident, clear — professional, ideal for business                 |
| `sal` | Smooth, balanced — versatile for a wide range of contexts           |

## Speech tags

You can use inline speech tags in your text for fine-grained delivery control.

**Inline tags:** `[pause]` `[long-pause]` `[laugh]` `[chuckle]` `[giggle]` `[cry]` `[tsk]` `[tongue-click]` `[lip-smack]` `[breath]` `[inhale]` `[exhale]` `[sigh]` `[hum-tune]`

**Wrapping tags:** `<whisper>`, `<shout>`, `<slow>`, `<fast>`, `<soft>`, `<loud>`, `<high-pitch>`, `<low-pitch>`, `<sing>`

## Next steps

<CardGroup cols={2}>
  <Card title="xAI LLM" icon="brain" href="/integrations/llm/xai">
    Advanced reasoning with Grok
  </Card>

  <Card title="xAI Realtime" icon="bolt" href="/integrations/realtime/xai">
    Speech-to-speech over WebSocket
  </Card>
</CardGroup>
