Avatar Class

Avatars consume the agent’s audio output and produce a synced video and audio feed of a virtual character. They run in passthrough mode: the avatar owns the agent’s outbound video and audio tracks, and its output never feeds back into the LLM or any video processors.

Class Hierarchy

The vision_agents.core.avatars module exports two classes:

Class	Purpose
`Avatar`	Abstract base class; consumes the agent’s audio output and publishes synced video and audio.
`AVSynchronizer`	Utility that owns paired audio/video tracks and delays video frames to match the audio buffer depth, keeping lip-sync accurate.

All three built-in implementations (LiveAvatar, Anam, LemonSlice) build on AVSynchronizer for output, so it’s the recommended building block for custom avatars too.

Lifecycle

The agent drives the avatar through a fixed lifecycle:

Agent.__init__ queries video_output() and calls attach_audio_input(stream), handing the avatar the inference flow’s audio output stream.
Agent.join() calls await avatar.start(), which opens the provider connection and begins consuming the input stream.
While running, the avatar drains input_audio_stream, forwards audio to the provider, and exposes lip-synced video via video_output() and audio via audio_output().
Agent.close() calls await avatar.close() for teardown.

When an avatar is set, the agent publishes avatar.audio_output() as outbound audio instead of the TTS stream directly — TTS still synthesises, the avatar lip-syncs and republishes.

Abstract Methods

Subclasses must implement all four:

Method	Description
`video_output()`	Return the outbound `aiortc.VideoStreamTrack` published to the call.
`audio_output()`	Return the outbound `AudioOutputStream` published to the call.
`async start()`	Open the provider connection and begin consuming `input_audio_stream`.
`async close()`	Tear down the provider connection and cancel any consumer tasks.

Subclasses may also implement an interrupt() method to stop the in-flight utterance at the provider during barge-in.

Properties & Helpers

Member	Description
`provider_name`	Class attribute identifying the provider (used in events and metrics).
`events`	`EventManager` for emitting avatar-specific events.
`metrics`	`MetricsCollector` for recording avatar metrics.
`input_audio_stream`	The agent’s audio output stream attached via `attach_audio_input`. Raises `ValueError` if accessed before attach.
`attach_audio_input(stream)`	Called by the agent to hand off its audio output stream. Override to customise how audio is consumed.

AVSynchronizer

AVSynchronizer is a utility class that solves the lip-sync problem: provider video and audio arrive on separate streams, and pushing them straight onto the outbound WebRTC tracks usually drifts. It owns a paired audio_output and video_output, delays each video frame by the current audio buffer depth, and paces frames at the configured fps (overriding aiortc’s hardcoded 30 fps).

from vision_agents.core.avatars import AVSynchronizer

sync = AVSynchronizer(
    width=1920,
    height=1080,
    fps=30,
    max_queue_size=30,  # typically int(fps * buffer_seconds)
)

Member	Description
`video_output`	The `QueuedVideoTrack` to expose from `Avatar.video_output()`.
`audio_output`	The `AudioOutputStream` to expose from `Avatar.audio_output()`.
`async write_video(frame)`	Queue an `av.VideoFrame` from the provider, delayed by the current audio buffer depth.
`async write_audio(pcm)`	Write a `PcmData` chunk from the provider to the audio track.
`async flush()`	Discard pending video frames and flush buffered audio (use on interrupt).
`close()`	Close the underlying audio stream.

Building a Custom Avatar

A minimal subclass wraps an AVSynchronizer, exposes its tracks, and pumps provider frames into it from a consumer task started in start():

import asyncio
from vision_agents.core.avatars import Avatar, AVSynchronizer
from vision_agents.core.agents.inference import AudioOutputStream
from getstream.video.rtc.track_util import PcmData
import av

class MyAvatar(Avatar):
    provider_name = "my_avatar"

    def __init__(self, width: int = 1280, height: int = 720, fps: int = 30) -> None:
        super().__init__()
        self._sync = AVSynchronizer(width=width, height=height, fps=fps)
        self._task: asyncio.Task | None = None

    def video_output(self):
        return self._sync.video_output

    def audio_output(self) -> AudioOutputStream:
        return self._sync.audio_output

    async def start(self) -> None:
        # open provider connection, then pump agent audio into it
        self._task = asyncio.create_task(self._consume(self.input_audio_stream))

    async def close(self) -> None:
        if self._task:
            self._task.cancel()
        self._sync.close()

    async def _consume(self, stream: AudioOutputStream) -> None:
        async for chunk in stream:
            # send chunk.data to the provider; for each response frame:
            #   await self._sync.write_video(frame)   # av.VideoFrame
            #   await self._sync.write_audio(pcm)     # PcmData
            ...

Usage

Pass an avatar to the agent at initialisation:

from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream, liveavatar

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=liveavatar.Avatar(),
)

Available Implementations

LiveAvatar

Real-time interactive avatars by HeyGen with WebSocket lip-sync.

Anam

Anam’s avatar SDK with configurable dimensions and frame rate.

LemonSlice

LemonSlice avatars delivered over LiveKit.

Getting Started

AI Technologies

Core Architecture

Reference

Class Hierarchy

Lifecycle

Abstract Methods

Properties & Helpers

AVSynchronizer

Building a Custom Avatar

Usage

Available Implementations

LiveAvatar

Anam

LemonSlice

Getting Started

AI Technologies

Core Architecture

Reference

Documentation Index

​Class Hierarchy

​Lifecycle

​Abstract Methods

​Properties & Helpers

​AVSynchronizer

​Building a Custom Avatar

​Usage

​Available Implementations

LiveAvatar

Anam

LemonSlice

Class Hierarchy

Lifecycle

Abstract Methods

Properties & Helpers

AVSynchronizer

Building a Custom Avatar

Usage

Available Implementations