> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Avatar Class

Avatars consume the agent's audio output and produce a synced video and audio feed of a virtual character. They run in passthrough mode: the avatar owns the agent's outbound video and audio tracks, and its output never feeds back into the LLM or any video processors.

## Class Hierarchy

The `vision_agents.core.avatars` module exports two classes:

| Class            | Purpose                                                                                                                         |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `Avatar`         | Abstract base class; consumes the agent's audio output and publishes synced video and audio.                                    |
| `AVSynchronizer` | Utility that owns paired audio/video tracks and delays video frames to match the audio buffer depth, keeping lip-sync accurate. |

All three built-in implementations (LiveAvatar, Anam, LemonSlice) build on `AVSynchronizer` for output, so it's the recommended building block for custom avatars too.

## Lifecycle

The agent drives the avatar through a fixed lifecycle:

1. **`Agent.__init__`** queries `video_output()` and calls `attach_audio_input(stream)`, handing the avatar the inference flow's audio output stream.
2. **`Agent.join()`** calls `await avatar.start()`, which opens the provider connection and begins consuming the input stream.
3. **While running**, the avatar drains `input_audio_stream`, forwards audio to the provider, and exposes lip-synced video via `video_output()` and audio via `audio_output()`.
4. **`Agent.close()`** calls `await avatar.close()` for teardown.

When an avatar is set, the agent publishes `avatar.audio_output()` as outbound audio instead of the TTS stream directly — TTS still synthesises, the avatar lip-syncs and republishes.

## Abstract Methods

Subclasses must implement all four:

| Method           | Description                                                            |
| ---------------- | ---------------------------------------------------------------------- |
| `video_output()` | Return the outbound `aiortc.VideoStreamTrack` published to the call.   |
| `audio_output()` | Return the outbound `AudioOutputStream` published to the call.         |
| `async start()`  | Open the provider connection and begin consuming `input_audio_stream`. |
| `async close()`  | Tear down the provider connection and cancel any consumer tasks.       |

Subclasses may also implement an `interrupt()` method to stop the in-flight utterance at the provider during barge-in.

## Properties & Helpers

| Member                       | Description                                                                                                       |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `provider_name`              | Class attribute identifying the provider (used in events and metrics).                                            |
| `events`                     | `EventManager` for emitting avatar-specific events.                                                               |
| `metrics`                    | `MetricsCollector` for recording avatar metrics.                                                                  |
| `input_audio_stream`         | The agent's audio output stream attached via `attach_audio_input`. Raises `ValueError` if accessed before attach. |
| `attach_audio_input(stream)` | Called by the agent to hand off its audio output stream. Override to customise how audio is consumed.             |

## AVSynchronizer

`AVSynchronizer` is a utility class that solves the lip-sync problem: provider video and audio arrive on separate streams, and pushing them straight onto the outbound WebRTC tracks usually drifts. It owns a paired `audio_output` and `video_output`, delays each video frame by the current audio buffer depth, and paces frames at the configured fps (overriding aiortc's hardcoded 30 fps).

```python theme={null}
from vision_agents.core.avatars import AVSynchronizer

sync = AVSynchronizer(
    width=1920,
    height=1080,
    fps=30,
    max_queue_size=30,  # typically int(fps * buffer_seconds)
)
```

| Member                     | Description                                                                            |
| -------------------------- | -------------------------------------------------------------------------------------- |
| `video_output`             | The `QueuedVideoTrack` to expose from `Avatar.video_output()`.                         |
| `audio_output`             | The `AudioOutputStream` to expose from `Avatar.audio_output()`.                        |
| `async write_video(frame)` | Queue an `av.VideoFrame` from the provider, delayed by the current audio buffer depth. |
| `async write_audio(pcm)`   | Write a `PcmData` chunk from the provider to the audio track.                          |
| `async flush()`            | Discard pending video frames and flush buffered audio (use on interrupt).              |
| `close()`                  | Close the underlying audio stream.                                                     |

## Building a Custom Avatar

A minimal subclass wraps an `AVSynchronizer`, exposes its tracks, and pumps provider frames into it from a consumer task started in `start()`:

```python theme={null}
import asyncio
from vision_agents.core.avatars import Avatar, AVSynchronizer
from vision_agents.core.agents.inference import AudioOutputStream
from getstream.video.rtc.track_util import PcmData
import av

class MyAvatar(Avatar):
    provider_name = "my_avatar"

    def __init__(self, width: int = 1280, height: int = 720, fps: int = 30) -> None:
        super().__init__()
        self._sync = AVSynchronizer(width=width, height=height, fps=fps)
        self._task: asyncio.Task | None = None

    def video_output(self):
        return self._sync.video_output

    def audio_output(self) -> AudioOutputStream:
        return self._sync.audio_output

    async def start(self) -> None:
        # open provider connection, then pump agent audio into it
        self._task = asyncio.create_task(self._consume(self.input_audio_stream))

    async def close(self) -> None:
        if self._task:
            self._task.cancel()
        self._sync.close()

    async def _consume(self, stream: AudioOutputStream) -> None:
        async for chunk in stream:
            # send chunk.data to the provider; for each response frame:
            #   await self._sync.write_video(frame)   # av.VideoFrame
            #   await self._sync.write_audio(pcm)     # PcmData
            ...
```

## Usage

Pass an avatar to the agent at initialisation:

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream, liveavatar

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
    avatar=liveavatar.Avatar(),
)
```

## Available Implementations

<CardGroup cols={3}>
  <Card title="LiveAvatar" href="/integrations/avatars/liveavatar">
    Real-time interactive avatars by HeyGen with WebSocket lip-sync.
  </Card>

  <Card title="Anam" href="/integrations/avatars/anam">
    Anam's avatar SDK with configurable dimensions and frame rate.
  </Card>

  <Card title="LemonSlice" href="/integrations/avatars/lemonslice">
    LemonSlice avatars delivered over LiveKit.
  </Card>
</CardGroup>
