Skip to main content

Documentation Index

Fetch the complete documentation index at: https://visionagents.ai/llms.txt

Use this file to discover all available pages before exploring further.

Anam provides real-time interactive avatar video with automatic lip-sync. Add a video avatar to your agent that speaks with natural movements synchronized to your agent’s voice output.
Vision Agents requires a Stream account for real-time transport. Anam provides API keys and avatar IDs through their dashboard.

Installation

uv add "vision-agents[anam]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import anam, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=anam.Avatar(),
)
Set ANAM_API_KEY and ANAM_AVATAR_ID in your environment, or pass them directly to anam.Avatar(...).

Parameters

NameTypeDefaultDescription
avatar_idstrNoneAnam avatar ID (defaults to ANAM_AVATAR_ID env var)
api_keystrNoneAPI key (defaults to ANAM_API_KEY env var)
client_optionsClientOptionsNoneAdvanced Anam client configuration
connect_timeoutfloatNoneSeconds to wait for connection (None = wait indefinitely)
session_ready_timeoutfloatNoneSeconds to wait for session ready (None = wait indefinitely)
widthint720Output video width in pixels
heightint480Output video height in pixels
fpsint30Output video frame rate. Must be > 0.
buffer_secondsfloat1.0Max video buffer depth in seconds ahead of audio playback. Must be > 0.

How It Works

  1. Agent TTS audio is resampled to 24 kHz mono and streamed to Anam
  2. Anam generates lip-synced avatar video and audio from the input
  3. Avatar video and audio frames are streamed back to call participants via Stream Edge
  4. When a user starts speaking, the avatar is automatically interrupted
With Realtime LLMs Anam also works with realtime speech-to-speech models. It subscribes to both TTS audio events and realtime audio output, so you can swap in a realtime LLM without any changes to the avatar setup.
from vision_agents.plugins import anam, gemini

agent = Agent(
    llm=gemini.Realtime(),
    avatar=anam.Avatar(),
    ...
)

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Build Your Own Avatar

Subclass the Avatar base class