Overview
What Vision Agents is and what you can build
Quickstart
Scaffold and run your first agent in under 5 minutes
GitHub
Star the project and explore examples
What You Can Build
AI Golf Coach
YOLO pose detection watches your swing via camera while Gemini gives real-time coaching feedback.
Phone Support Agent
Twilio-powered agent answers inbound calls with RAG-backed knowledge bases. Start with the phone setup guide.
Smart Security Camera
Face recognition and package detection with YOLO, sending automated alerts in real time.
Live Sports Commentator
Roboflow object detection tracks players and ball while an LLM delivers play-by-play.
Live Video Try-On
Real-time virtual try-on with Decart’s Lucy-2 model. Swap outfits using reference images and prompts.
Interactive Avatar
Anam avatars that see, hear, and respond with real-time voice and video.
Capabilities
- 25+ integrations: OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
- Two modes: Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
- Video processing: Run YOLO, Roboflow, or custom models on every frame
- Phone support: Twilio integration for voice calls with bi-directional audio
- RAG: TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
- Production ready: HTTP server, Prometheus metrics, Docker and Kubernetes deployment
Next Steps
Quickstart
Scaffold and run your first agent
Integrations
Browse 25+ supported AI providers
Guides
Deploy to production with Docker and metrics
Try Stream Video
Get 333,000 free participant minutes