Fish Audio provides high-quality STT and TTS with automatic language detection, voice cloning support, and fine-grained prosody control. Ideal for multilingual applications.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
uv add "vision-agents[fish]"
Quick start
from vision_agents.core import Agent, User
from vision_agents.plugins import fish, gemini, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=fish.STT(),
tts=fish.TTS(), # Uses S2-Pro model by default
)
Set FISH_API_KEY in your environment or pass api_key directly.
TTS
Fish Audio TTS supports multiple backend models and fine-grained prosody control with the S2 model.
Basic usage
tts = fish.TTS(reference_id="your_voice_id") # Optional voice cloning
Using prosody control
The S2-Pro model (default) supports inline control tags for natural prosody:
tts = fish.TTS() # Uses s2-pro by default
# Include prosody tags in your text
text = "[whisper] This is a secret. [super happy] But this is great news!"
text = "Hello! [laugh] That's so funny."
Selecting a model
# Use the latest S2-Pro model with prosody control
tts = fish.TTS(model="s2-pro")
# Use legacy models if needed
tts = fish.TTS(model="speech-1.5")
tts = fish.TTS(model="speech-1.6")
# Use fast models for lower latency
tts = fish.TTS(model="s1")
tts = fish.TTS(model="s1-mini")
Parameters
| Name | Type | Default | Description |
|---|
model | str | "s2-pro" | Backend model: "s2-pro", "speech-1.5", "speech-1.6", "s1", "s1-mini" |
reference_id | str | None | Voice ID for voice cloning |
api_key | str | None | API key (defaults to FISH_API_KEY env var) |
base_url | str | None | Custom API endpoint |
STT
Fish Audio STT buffers audio per participant (minimum 1 second) before sending to the API for accurate transcription.
stt = fish.STT(language="en") # Or None for auto-detection
Parameters
| Name | Type | Default | Description |
|---|
language | str | None | Language code ("en", "zh", etc.) or None for auto-detect |
api_key | str | None | API key (defaults to FISH_API_KEY env var) |
Next Steps