Vision Agents Documentation

Vision Agents is an open-source Video AI framework for building real-time voice and video applications. It ships with Stream Video as its default low-latency transport, powered by our global edge network. The framework is edge/transport agnostic meaning developers can also bring any edge layer they like.

What can you build?

Vision Agents makes it simple to prototype and scale a wide range of AI-powered video apps, including:

Coaching & Training — live sports coaching, guided workouts
Collaboration — meeting assistants, note-taking, transcription
Automation & Robotics — IoT control, surveillance, manufacturing workflows
Video AI — video avatars, character agents

Get Started

Installation

Install Vision Agents and set up your first project

Voice Agents

Build real-time voice agents with AI

Video Agents

Create AI-powered video applications

Integrations

Connect with popular AI providers

Built-in AI integrations

Out of the box, Vision Agents supports popular providers across the AI stack:

LLMs: OpenAI, Anthropic, Gemini, xAI
Realtime APIs: Gemini (websockets), OpenAI (WebRTC)
Speech-to-Text (STT): Deepgram, Moonshine, Assembly AI
Text-to-Speech (TTS): ElevenLabs, Assembly AI, Cartesia, Moonshine
Turn / Voice Detection: Fal, Silero, Krisp
Audio & Video Processing: YOLO
Memory & Context: In-memory, Stream Chat

Each integration is built on extensible base classes. For example, with BaseProcessor or VideoProcessorMixin, you can plug in custom computer-vision models like Ultralytics YOLO.

Explore the Documentation

AI Technologies

Learn about TTS, STT, VAD, and more

Core Architecture

Understand the framework architecture

Guides

Step-by-step implementation guides

Cookbook

Ready-to-use examples and recipes

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

Vision Agents Documentation

What can you build?

Get Started

Installation

Voice Agents

Video Agents

Integrations

Built-in AI integrations

Explore the Documentation

AI Technologies

Core Architecture

Guides

Cookbook

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

​What can you build?

​Get Started

Installation

Voice Agents

Video Agents

Integrations

​Built-in AI integrations

​Explore the Documentation

AI Technologies

Core Architecture

Guides

Cookbook

What can you build?

Get Started

Built-in AI integrations

Explore the Documentation