LiveAvatar

LiveAvatar (by HeyGen) provides real-time interactive avatars with lip-sync driven by your agent’s audio. Pass liveavatar.Avatar() to the agent’s avatar parameter to stream synchronized video and audio into the call.

Vision Agents requires a Stream account for real-time transport. Get a LiveAvatar API key and avatar ID from the LiveAvatar dashboard.

Installation

uv add "vision-agents[liveavatar]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream, liveavatar

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=liveavatar.Avatar(),
)

Set LIVEAVATAR_API_KEY and LIVEAVATAR_AVATAR_ID in your environment, or pass api_key and avatar_id directly to Avatar().

Parameters

Name	Type	Default	Description
`avatar_id`	`str`	`None`	LiveAvatar avatar UUID (defaults to `LIVEAVATAR_AVATAR_ID` env var)
`api_key`	`str`	`None`	API key (defaults to `LIVEAVATAR_API_KEY` env var)
`base_url`	`str`	`None`	Override the LiveAvatar API base URL
`is_sandbox`	`bool`	`True`	Sandbox sessions don’t burn credits but are duration-capped
`max_session_duration`	`int`	`None`	Session length cap in seconds; `None` uses the API default
`video_quality`	`str`	`"high"`	`"low"`, `"medium"`, `"high"`, or `"very_high"`
`video_encoding`	`str`	`"H264"`	`"H264"` or `"VP8"`
`width`	`int`	`1280`	Output video width in pixels
`height`	`int`	`720`	Output video height in pixels
`fps`	`int`	`30`	Output video frame rate
`buffer_seconds`	`float`	`1.0`	Max video buffer depth in seconds ahead of audio playback

How It Works

LiveAvatar runs in LITE mode with the custom-agent integration path:

Your agent’s TTS (or Realtime LLM) audio is streamed to LiveAvatar over WebSocket
LiveAvatar generates lip-synced avatar video and audio
Synchronized A/V is published to call participants via Stream Edge

With standard LLMs

LLM generates text → TTS converts to audio → Audio sent to LiveAvatar → LiveAvatar returns synchronized avatar video and audio

With Realtime LLMs

Realtime LLM generates audio → Audio sent to LiveAvatar → LiveAvatar returns synchronized avatar video and audio

# With Gemini Realtime
agent = Agent(
    llm=gemini.Realtime(),
    avatar=liveavatar.Avatar(is_sandbox=False),
)

Set is_sandbox=False in production. Sandbox sessions are free but duration-capped.

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Build Your Own Avatar

Subclass the Avatar base class

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Installation

Quick Start

Parameters

How It Works

Next Steps

Build a Voice Agent

Build a Video Agent

Build Your Own Avatar

Overview

Language Models

Realtime

Speech-to-Text

Text-to-Speech

Vision & Video

Avatars

Turn Detection

Infrastructure

Edge Transport

Custom Integrations

Documentation Index

​Installation

​Quick Start

​Parameters

​How It Works

​Next Steps

Build a Voice Agent

Build a Video Agent

Build Your Own Avatar

Installation

Quick Start

Parameters

How It Works

Next Steps