Skip to main content
HeyGen is a service that provides realistic AI avatars with automatic lip-sync capabilities. The HeyGen plugin allows you to add a video avatar to your AI agent that speaks with natural movements and expressions synchronized to your agent’s voice. The HeyGen plugin for the Stream Python AI SDK allows you to add avatar video functionality to your project, creating more engaging and human-like AI interactions.

Features

  • 🎤 Automatic Lip-Sync: Avatar automatically syncs with audio
  • 🚀 WebRTC Streaming: Low-latency real-time video streaming
  • 🎨 Customizable: Change avatar, quality, and resolution

Installation

Install the Stream HeyGen plugin with
uv add vision-agents[heygen]

Example

Check out our HeyGen examples to see working code samples using the plugin, or read on for some key details.

Initialisation

The HeyGen plugin for Stream exists in the form of the AvatarPublisher class:
from vision_agents.plugins import heygen

avatar = heygen.AvatarPublisher(
    avatar_id="default",
    quality=heygen.VideoQuality.HIGH
)
To initialise without passing in the API key, make sure the HEYGEN_API_KEY is available as an environment variable. You can do this either by defining it in a .env file or exporting it directly in your terminal.

Parameters

These are the parameters available in the HeyGen AvatarPublisher plugin for you to customise:
NameTypeDefaultDescription
avatar_idstr"default"HeyGen avatar ID to use for streaming. Get this from your HeyGen dashboard.
qualityVideoQualityVideoQuality.HIGHVideo quality setting. Options: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH.
resolutionTuple[int, int](1920, 1080)Output video resolution as (width, height).
api_keystr or NoneNoneYour HeyGen API key. If not provided, the plugin will look for the HEYGEN_API_KEY environment variable.

How It Works

The HeyGen avatar integration works differently depending on whether you’re using a standard streaming LLM or a Realtime LLM: When using a standard streaming LLM (like Gemini LLM), the flow is:
  1. Text Generation: Your LLM generates text responses
  2. Lip-Sync: Text is sent directly to HeyGen for avatar lip-sync generation
  3. Audio Synthesis: HeyGen generates both the avatar video and audio with TTS
  4. Streaming: Avatar video and audio are streamed to call participants
This approach has lower latency because text goes directly to HeyGen without transcription delays.
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, heygen

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Avatar Assistant"),
    instructions="You're a friendly AI assistant.",
    
    llm=gemini.LLM("gemini-2.0-flash-exp"),
    stt=deepgram.STT(),
    
    processors=[
        heygen.AvatarPublisher(
            avatar_id="default",
            quality=heygen.VideoQuality.HIGH
        )
    ]
)

With Realtime LLMs

When using a Realtime LLM (like Gemini Realtime), the flow is:
  1. Audio Generation: Realtime LLM generates audio directly
  2. Transcription: Audio is transcribed to text
  3. Lip-Sync: Text transcription is sent to HeyGen for avatar lip-sync
  4. Video Only: HeyGen generates avatar video (audio comes from the Realtime LLM)
  5. Streaming: Avatar video and LLM audio are streamed together
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, heygen

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Avatar Assistant"),
    instructions="You're a friendly AI assistant.",
    
    llm=gemini.Realtime(model="gemini-2.5-flash-native-audio-preview-09-2025"),
    
    processors=[
        heygen.AvatarPublisher(
            avatar_id="default",
            quality=heygen.VideoQuality.HIGH
        )
    ]
)

Usage in Agent

Add the AvatarPublisher to your agent’s processors list:
from uuid import uuid4
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, heygen

async def start_avatar_agent():
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="AI Assistant with Avatar", id="agent"),
        instructions="You're a friendly AI assistant.",
        
        llm=gemini.LLM("gemini-2.0-flash"),
        stt=deepgram.STT(),
        
        processors=[
            heygen.AvatarPublisher(
                avatar_id="default",
                quality=heygen.VideoQuality.HIGH,
                resolution=(1920, 1080)
            )
        ]
    )
    
    call = agent.edge.client.video.call("default", str(uuid4()))
    
    async with await agent.join(call):
        await agent.edge.open_demo(call)
        await agent.simple_response("Hello! I'm your AI assistant with an avatar.")
        await agent.finish()

Video Quality Options

Choose the appropriate quality based on your bandwidth and requirements:
  • VideoQuality.LOW: Lower bandwidth usage, suitable for slower connections
  • VideoQuality.MEDIUM: Balanced quality and bandwidth
  • VideoQuality.HIGH: Best quality, requires stable high-bandwidth connection

Getting Your Avatar ID

  1. Sign up for a HeyGen account
  2. Navigate to your HeyGen dashboard
  3. Find your avatar ID in the avatar settings
  4. Use this ID in the avatar_id parameter

Troubleshooting

Connection Issues

If you experience connection problems:
  • Verify your HeyGen API key is valid
  • Ensure network access to HeyGen’s servers
  • Check firewall settings for WebRTC traffic

Video Quality Issues

To optimize video quality:
  • Use quality=VideoQuality.HIGH for best results
  • Ensure stable internet connection
  • Consider lowering resolution if bandwidth is limited

No Avatar Appearing

  • Check browser console for errors
  • Verify Stream credentials are correct
  • Ensure HeyGen API key has proper permissions