> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Vision Agents

> Build low-latency voice and video AI agents with any model

The open-source Python framework for real-time voice and video AI. Plug in any LLM, speech, or vision model from 25+ providers and ship agents for telehealth, voice support, live coaching, and anything else you can wire up. Sub-500ms latency on [Stream's global edge network](https://getstream.io/video/).

<div style={{ margin: '1.5rem 0' }}>
  <iframe
    src="https://emotional-support.visionagents.ai/?embed=true"
    style={{
  width: '100%',
  aspectRatio: '1 / 1',
  border: '1px solid #e5e7eb',
  borderRadius: '12px',
  boxShadow: '0 4px 6px -1px rgba(0, 0, 0, 0.1)',
}}
    allow="camera; microphone"
    title="Vision Agents Demo"
  />
</div>

New here? Read the [Overview](/introduction/overview) for what Vision Agents is and what you can build, then run the [Quickstart](/introduction/quickstart).

<CardGroup cols={2}>
  <Card title="Overview" icon="book-open" href="/introduction/overview">
    What Vision Agents is and what you can build
  </Card>

  <Card title="Quickstart" icon="rocket" href="/introduction/quickstart">
    Scaffold and run your first agent in under 5 minutes
  </Card>

  <Card title="GitHub" icon="github" href="https://github.com/GetStream/vision-agents">
    Star the project and explore examples
  </Card>
</CardGroup>

## What You Can Build

<CardGroup cols={3}>
  <Card title="AI Golf Coach" icon="golf-ball-tee" href="/examples/golf-coach">
    YOLO pose detection watches your swing via camera while Gemini gives real-time coaching feedback.
  </Card>

  <Card title="Phone Support Agent" icon="phone" href="/examples/phone-and-rag">
    Twilio-powered agent answers inbound calls with RAG-backed knowledge bases. Start with the phone setup guide.
  </Card>

  <Card title="Smart Security Camera" icon="camera-security" href="/examples/security-camera">
    Face recognition and package detection with YOLO, sending automated alerts in real time.
  </Card>

  <Card title="Live Sports Commentator" icon="futbol" href="/examples/football-commentator">
    Roboflow object detection tracks players and ball while an LLM delivers play-by-play.
  </Card>

  <Card title="Live Video Try-On" icon="wand-magic-sparkles" href="/examples/visual-storyteller">
    Real-time virtual try-on with Decart's Lucy-2 model. Swap outfits using reference images and prompts.
  </Card>

  <Card title="Interactive Avatar" icon="user" href="/integrations/avatars/anam">
    Anam avatars that see, hear, and respond with real-time voice and video.
  </Card>
</CardGroup>

## Capabilities

* **25+ integrations**: OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
* **Two modes**: Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
* **Video processing**: Run YOLO, Roboflow, or custom models on every frame
* **Phone support**: Twilio integration for voice calls with bi-directional audio
* **RAG**: TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
* **Production ready**: HTTP server, Prometheus metrics, Docker and Kubernetes deployment

## Next Steps

<CardGroup cols={2}>
  <Card title="Quickstart" icon="rocket" href="/introduction/quickstart">
    Scaffold and run your first agent
  </Card>

  <Card title="Integrations" icon="plug" href="/integrations/introduction-to-integrations">
    Browse 25+ supported AI providers
  </Card>

  <Card title="Guides" icon="book" href="/guides/deployment">
    Deploy to production with Docker and metrics
  </Card>

  <Card title="Try Stream Video" icon="circle-play" href="https://getstream.io/try-for-free">
    Get 333,000 free participant minutes
  </Card>
</CardGroup>
