Skip to main content
Fish Audio provides high-quality STT and TTS with automatic language detection, voice cloning support, and fine-grained prosody control. Ideal for multilingual applications.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[fish]"

Quick start

from vision_agents.core import Agent, User
from vision_agents.plugins import fish, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=fish.STT(),
    tts=fish.TTS(),  # Uses S2-Pro model by default
)
Set FISH_API_KEY in your environment or pass api_key directly.

TTS

Fish Audio TTS supports multiple backend models and fine-grained prosody control with the S2 model.

Basic usage

tts = fish.TTS(reference_id="your_voice_id")  # Optional voice cloning

Using prosody control

The S2-Pro model (default) supports inline control tags for natural prosody:
tts = fish.TTS()  # Uses s2-pro by default

# Include prosody tags in your text
text = "[whisper] This is a secret. [super happy] But this is great news!"
text = "Hello! [laugh] That's so funny."

Selecting a model

# Use the latest S2-Pro model with prosody control
tts = fish.TTS(model="s2-pro")

# Use legacy models if needed
tts = fish.TTS(model="speech-1.5")
tts = fish.TTS(model="speech-1.6")

# Use fast models for lower latency
tts = fish.TTS(model="s1")
tts = fish.TTS(model="s1-mini")

Parameters

NameTypeDefaultDescription
modelstr"s2-pro"Backend model: "s2-pro", "speech-1.5", "speech-1.6", "s1", "s1-mini"
reference_idstrNoneVoice ID for voice cloning
api_keystrNoneAPI key (defaults to FISH_API_KEY env var)
base_urlstrNoneCustom API endpoint

STT

Fish Audio STT buffers audio per participant (minimum 1 second) before sending to the API for accurate transcription.
stt = fish.STT(language="en")  # Or None for auto-detection

Parameters

NameTypeDefaultDescription
languagestrNoneLanguage code ("en", "zh", etc.) or None for auto-detect
api_keystrNoneAPI key (defaults to FISH_API_KEY env var)

Next Steps