Skip to main content
Anam provides real-time interactive avatar video with automatic lip-sync. Add a video avatar to your agent that speaks with natural movements synchronized to your agent’s voice output.
Vision Agents uses Stream Video for real-time WebRTC transport by default. External WebRTC transports are supported as well. Most AI providers offer free tiers to get started.
Anam provides API keys and avatar IDs through their dashboard.

Installation

uv add "vision-agents[anam]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import anam, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=anam.Avatar(),
)
Set ANAM_API_KEY and ANAM_AVATAR_ID in your environment, or pass them directly to anam.Avatar(...).

Parameters

NameTypeDefaultDescription
avatar_idstrNoneAnam avatar ID (defaults to ANAM_AVATAR_ID env var)
api_keystrNoneAPI key (defaults to ANAM_API_KEY env var)
client_optionsClientOptionsNoneAdvanced Anam client configuration
connect_timeoutfloatNoneSeconds to wait for connection (None = wait indefinitely)
session_ready_timeoutfloatNoneSeconds to wait for session ready (None = wait indefinitely)
widthint720Output video width in pixels
heightint480Output video height in pixels
fpsint30Output video frame rate. Must be > 0.
buffer_secondsfloat1.0Max video buffer depth in seconds ahead of audio playback. Must be > 0.

How It Works

  1. Agent TTS audio is resampled to 24 kHz mono and streamed to Anam
  2. Anam generates lip-synced avatar video and audio from the input
  3. Avatar video and audio frames are streamed back to call participants via Stream Edge
  4. When a user starts speaking, the avatar is automatically interrupted
With Realtime LLMs Anam also works with realtime speech-to-speech models. It subscribes to both TTS audio events and realtime audio output, so you can swap in a realtime LLM without any changes to the avatar setup.
from vision_agents.plugins import anam, gemini

agent = Agent(
    llm=gemini.Realtime(),
    avatar=anam.Avatar(),
    ...
)

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Build Your Own Avatar

Subclass the Avatar base class