Skip to main content
Pocket TTS is a lightweight Text-to-Speech model from Kyutai that runs efficiently on CPU. It offers low latency (~200ms to first audio), voice cloning support, and multiple built-in voices, making it ideal for local voice synthesis without requiring a GPU or external API. The Pocket TTS plugin for the Stream Python AI SDK allows you to add local TTS functionality to your project.

Installation

Install the Stream Pocket TTS plugin with
uv add vision-agents[pocket]

Example

Check out our Pocket TTS example to see a working code sample using the plugin, or read on for some key details.

Initialisation

The Pocket TTS plugin exists in the form of the TTS class:
from vision_agents.plugins import pocket

# Create TTS with default voice
tts = pocket.TTS()

# Or specify a built-in voice
tts = pocket.TTS(voice="marius")

# Or use a custom voice for cloning
tts = pocket.TTS(voice="path/to/your/voice.wav")

Parameters

These are the parameters available in the Pocket TTS plugin for you to customise:
NameTypeDefaultDescription
voicestr"alba"Built-in voice name or path to a custom wav file for voice cloning. Built-in options: alba, marius, javert, jean, fantine, cosette, eponine, azelma.
clientTTSModel or NoneNoneOptional pre-initialized TTSModel instance for advanced use cases.

Built-in voices

Pocket TTS includes several built-in voices:
  • alba - Default voice
  • marius
  • javert
  • jean
  • fantine
  • cosette
  • eponine
  • azelma

Voice cloning

You can clone a voice by providing a path to a wav file:
from vision_agents.plugins import pocket

# Use a local wav file
tts = pocket.TTS(voice="path/to/your/voice.wav")

# Or use a HuggingFace-hosted voice
tts = pocket.TTS(voice="hf://kyutai/tts-voices/alba-mackenna/casual.wav")

Features

  • CPU-only - No GPU required, runs efficiently on standard hardware
  • Low latency - ~200ms to first audio
  • Small model size - 100M parameters
  • Voice cloning - Use custom wav files for voice cloning
  • Built-in voices - 8 pre-configured voices available