Vision Agents

The open-source Python framework for real-time voice and video AI. Plug in any LLM, speech, or vision model from 35+ providers and ship agents for telehealth, voice support, live coaching, and anything else you can wire up. Sub-500ms latency on Stream’s global edge network.

New here? Read the Overview for what Vision Agents is and what you can build, then run the Quickstart.

Overview

What Vision Agents is and what you can build

Quickstart

Scaffold and run your first agent in under 5 minutes

GitHub

Star the project and explore examples

What You Can Build

AI Golf Coach

YOLO pose detection watches your swing via camera while Gemini gives real-time coaching feedback.

Phone Support Agent

Twilio-powered agent answers inbound calls with RAG-backed knowledge bases. Start with the phone setup guide.

Smart Security Camera

Face recognition and package detection with YOLO, sending automated alerts in real time.

Live Sports Commentator

Roboflow object detection tracks players and ball while an LLM delivers play-by-play.

Live Video Try-On

Real-time virtual try-on with Decart’s Lucy-2 model. Swap outfits using reference images and prompts.

Interactive Avatar

Anam avatars that see, hear, and respond with real-time voice and video.

Capabilities

35+ integrations: OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
Two modes: Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
Video processing: Run YOLO, Roboflow, or custom models on every frame
Phone support: Twilio integration for voice calls with bi-directional audio
RAG: TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
Production ready: HTTP server, Prometheus metrics, Docker and Kubernetes deployment

Next Steps

Quickstart

Scaffold and run your first agent

Integrations

Browse 35+ supported AI providers

Guides

Deploy to production with Docker and metrics

Try Stream Video

Get 333,000 free participant minutes

⌘I

Getting Started

AI Technologies

Core Reference

Reference

Overview

Quickstart

GitHub

What You Can Build

AI Golf Coach

Phone Support Agent

Smart Security Camera

Live Sports Commentator

Live Video Try-On

Interactive Avatar

Capabilities

Next Steps

Quickstart

Integrations

Guides

Try Stream Video

Overview

Quickstart

GitHub

​What You Can Build

AI Golf Coach

Phone Support Agent

Smart Security Camera

Live Sports Commentator

Live Video Try-On

Interactive Avatar

​Capabilities

​Next Steps

Quickstart

Integrations

Guides

Try Stream Video

What You Can Build

Capabilities

Next Steps