Fish Audio TTS - Vision Agents

Fish Audio provides high-quality text-to-speech with fine-grained prosody control, voice cloning support, and multiple backend models. Ideal for multilingual applications.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Fish Audio also provides speech-to-text with automatic language detection. You can use both in the same agent.

Installation

uv add "vision-agents[fish]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import fish, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=fish.STT(),
    tts=fish.TTS(),  # Uses S2-Pro model by default
)

Set FISH_API_KEY in your environment or pass api_key directly.

Basic Usage

tts = fish.TTS(reference_id="your_voice_id")  # Optional voice cloning

Prosody Control

The S2-Pro model (default) supports inline control tags for natural prosody:

tts = fish.TTS()  # Uses s2-pro by default

# Include prosody tags in your text
text = "[whisper] This is a secret. [super happy] But this is great news!"
text = "Hello! [laugh] That's so funny."

Selecting a Model

# Use the latest S2-Pro model with prosody control
tts = fish.TTS(model="s2-pro")

# Use legacy models if needed
tts = fish.TTS(model="speech-1.5")
tts = fish.TTS(model="speech-1.6")

# Use fast models for lower latency
tts = fish.TTS(model="s1")
tts = fish.TTS(model="s1-mini")

Parameters

Name	Type	Default	Description
`model`	`str`	`"s2-pro"`	Backend model: `"s2-pro"`, `"speech-1.5"`, `"speech-1.6"`, `"s1"`, `"s1-mini"`
`reference_id`	`str`	`None`	Voice ID for voice cloning
`api_key`	`str`	`None`	API key (defaults to `FISH_API_KEY` env var)
`base_url`	`str`	`None`	Custom API endpoint

Next Steps

Fish Audio STT

Speech-to-text with auto language detection

Build a Voice Agent

Get started with voice

ElevenLabs TTS Inworld

​Installation

​Quick Start

​Basic Usage

​Prosody Control

​Selecting a Model

​Parameters

​Next Steps

Fish Audio STT

Build a Voice Agent

Installation

Quick Start

Basic Usage

Prosody Control

Selecting a Model

Parameters

Next Steps