Fish Audio provides speech-to-text with automatic language detection. Buffers audio per participant (minimum 1 second) before sending to the API for accurate transcription.Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
Quick Start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
language | str | None | Language code ("en", "zh", etc.) or None for auto-detect |
api_key | str | None | API key (defaults to FISH_API_KEY env var) |
Next Steps
Fish Audio TTS
Text-to-speech with prosody control
Build a Voice Agent
Get started with voice

