Pocket TTS is a lightweight local TTS from Kyutai that runs on CPU. Offers ~200ms latency, voice cloning, and 8 built-in voices without requiring a GPU or external API.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
# Use a local wav filetts = pocket.TTS(voice="path/to/your/voice.wav")# Or a HuggingFace-hosted voicetts = pocket.TTS(voice="hf://kyutai/tts-voices/alba-mackenna/casual.wav")