Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
Quick start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "grok-4-1-fast-non-reasoning" | Grok realtime model |
voice | str | "Ara" | Voice ("Ara", "Rex", "Sal", "Eve", "Leo") |
api_key | str | None | API key (defaults to XAI_API_KEY env var) |
turn_detection | str or None | "server_vad" | Turn detection mode ("server_vad" or None for manual) |
vad_interrupt_response | bool | False | Allow VAD to auto-cancel the assistant response on detected speech |
web_search | bool | True | Enable web search tool |
x_search | bool | True | Enable X (Twitter) search tool |
x_search_allowed_handles | list[str] | None | Restrict X search to specific handles |
vad_interrupt_response defaults to False because speaker-to-mic echo can cause the server to cancel the agent’s own response mid-sentence. Set to True only if your audio setup avoids echo feedback.Function calling
Next steps
xAI LLM
Advanced reasoning with Grok
xAI TTS
Text-to-speech with expressive voices

