Skip to main content

Documentation Index

Fetch the complete documentation index at: https://visionagents.ai/llms.txt

Use this file to discover all available pages before exploring further.

LiveAvatar (by HeyGen) provides real-time interactive avatars with lip-sync driven by your agent’s audio. Pass liveavatar.Avatar() to the agent’s avatar parameter to stream synchronized video and audio into the call.
Vision Agents requires a Stream account for real-time transport. Get a LiveAvatar API key and avatar ID from the LiveAvatar dashboard.

Installation

uv add "vision-agents[liveavatar]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream, liveavatar

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=liveavatar.Avatar(),
)
Set LIVEAVATAR_API_KEY and LIVEAVATAR_AVATAR_ID in your environment, or pass api_key and avatar_id directly to Avatar().

Parameters

NameTypeDefaultDescription
avatar_idstrNoneLiveAvatar avatar UUID (defaults to LIVEAVATAR_AVATAR_ID env var)
api_keystrNoneAPI key (defaults to LIVEAVATAR_API_KEY env var)
base_urlstrNoneOverride the LiveAvatar API base URL
is_sandboxboolTrueSandbox sessions don’t burn credits but are duration-capped
max_session_durationintNoneSession length cap in seconds; None uses the API default
video_qualitystr"high""low", "medium", "high", or "very_high"
video_encodingstr"H264""H264" or "VP8"
widthint1280Output video width in pixels
heightint720Output video height in pixels
fpsint30Output video frame rate
buffer_secondsfloat1.0Max video buffer depth in seconds ahead of audio playback

How It Works

LiveAvatar runs in LITE mode with the custom-agent integration path:
  1. Your agent’s TTS (or Realtime LLM) audio is streamed to LiveAvatar over WebSocket
  2. LiveAvatar generates lip-synced avatar video and audio
  3. Synchronized A/V is published to call participants via Stream Edge
With standard LLMs
  1. LLM generates text → TTS converts to audio → Audio sent to LiveAvatar → LiveAvatar returns synchronized avatar video and audio
With Realtime LLMs
  1. Realtime LLM generates audio → Audio sent to LiveAvatar → LiveAvatar returns synchronized avatar video and audio
# With Gemini Realtime
agent = Agent(
    llm=gemini.Realtime(),
    avatar=liveavatar.Avatar(is_sandbox=False),
)
Set is_sandbox=False in production. Sandbox sessions are free but duration-capped.

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Build Your Own Avatar

Subclass the Avatar base class