Anam

Anam provides real-time interactive avatar video with automatic lip-sync. Add a video avatar to your agent that speaks with natural movements synchronized to your agent’s voice output.

Vision Agents requires a Stream account for real-time transport. Anam provides API keys and avatar IDs through their dashboard.

Installation

uv add "vision-agents[anam]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import anam, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=anam.Avatar(),
)

Set ANAM_API_KEY and ANAM_AVATAR_ID in your environment, or pass them directly to anam.Avatar(...).

Parameters

Name	Type	Default	Description
`avatar_id`	`str`	`None`	Anam avatar ID (defaults to `ANAM_AVATAR_ID` env var)
`api_key`	`str`	`None`	API key (defaults to `ANAM_API_KEY` env var)
`client_options`	`ClientOptions`	`None`	Advanced Anam client configuration
`connect_timeout`	`float`	`None`	Seconds to wait for connection (`None` = wait indefinitely)
`session_ready_timeout`	`float`	`None`	Seconds to wait for session ready (`None` = wait indefinitely)
`width`	`int`	`720`	Output video width in pixels
`height`	`int`	`480`	Output video height in pixels
`fps`	`int`	`30`	Output video frame rate. Must be `> 0`.
`buffer_seconds`	`float`	`1.0`	Max video buffer depth in seconds ahead of audio playback. Must be `> 0`.

How It Works

Agent TTS audio is resampled to 24 kHz mono and streamed to Anam
Anam generates lip-synced avatar video and audio from the input
Avatar video and audio frames are streamed back to call participants via Stream Edge
When a user starts speaking, the avatar is automatically interrupted

With Realtime LLMs Anam also works with realtime speech-to-speech models. It subscribes to both TTS audio events and realtime audio output, so you can swap in a realtime LLM without any changes to the avatar setup.

from vision_agents.plugins import anam, gemini

agent = Agent(
    llm=gemini.Realtime(),
    avatar=anam.Avatar(),
    ...
)

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Build Your Own Avatar

Subclass the Avatar base class

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Installation

Quick Start

Parameters

How It Works

Next Steps

Build a Voice Agent

Build a Video Agent

Build Your Own Avatar

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Documentation Index

​Installation

​Quick Start

​Parameters

​How It Works

​Next Steps

Build a Voice Agent

Build a Video Agent

Build Your Own Avatar

Installation

Quick Start

Parameters

How It Works

Next Steps