Skip to main content
Sarvam AI provides streaming text-to-speech using the Bulbul model, with configurable speaker, pace, and language support for Indian languages.
Vision Agents requires a Stream account for real-time transport. Get your Sarvam API key from the Sarvam dashboard.
Sarvam also provides speech-to-text and an LLM. You can use all three in the same agent.

Installation

uv add "vision-agents[sarvam]"

Quick start

from vision_agents.core import Agent, User
from vision_agents.plugins import sarvam, getstream, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Sarvam Agent", id="agent"),
    instructions="Reply in the same language the user speaks.",
    llm=sarvam.LLM(model="sarvam-m"),
    stt=sarvam.STT(language="hi-IN"),
    tts=sarvam.TTS(speaker="shubh"),
    turn_detection=smart_turn.TurnDetection(),
)
Set SARVAM_API_KEY in your environment or pass api_key directly.

Parameters

tts = sarvam.TTS(
    model="bulbul:v3",
    language="hi-IN",
    speaker="anushka",
    pace=1.0,
)
NameTypeDefaultDescription
modelstr"bulbul:v3"TTS model (bulbul:v2 or bulbul:v3)
languagestr"hi-IN"Target language code (e.g. hi-IN, en-IN)
speakerstr"anushka"Speaker voice id (e.g. shubh, anushka)
sample_rateint24000Output sample rate in Hz
pacefloatNoneSpeech pace (bulbul:v3 supports 0.5–2.0)
pitchfloatNoneSpeech pitch (bulbul:v2 only)
loudnessfloatNoneSpeech loudness (bulbul:v2 only)
temperaturefloatNoneSampling temperature (bulbul:v3 only)
enable_preprocessingboolTrueNormalize mixed-language and numeric text
api_keystrNoneAPI key (defaults to SARVAM_API_KEY env var)

Next steps

Sarvam STT

Streaming speech-to-text for Indian languages

Sarvam LLM

Chat completions with Sarvam models