# Vision Agents ## Docs - [Model Context Protocol (MCP)](https://visionagents.ai/ai-technologies/model-context-protocol.md) - [Speech To Speech (STS)](https://visionagents.ai/ai-technologies/speech-to-speech.md) - [Speech To Text (STT)](https://visionagents.ai/ai-technologies/speech-to-text.md) - [Text To Speech (TTS)](https://visionagents.ai/ai-technologies/text-to-speech.md) - [Turn Detection](https://visionagents.ai/ai-technologies/turn-detection.md) - [Agent Class](https://visionagents.ai/core/agent-core.md) - [LLM Class](https://visionagents.ai/core/llm-core.md) - [Overview](https://visionagents.ai/core/overview.md) - [Processors Class](https://visionagents.ai/core/processors-core.md) - [Realtime Class](https://visionagents.ai/core/realtime-core.md) - [Speech-to-Text and Text-to-Speech Class](https://visionagents.ai/core/stt-tts-core.md) - [Telemetry & Metrics](https://visionagents.ai/core/telemetry.md) - [Cartesia Narrator](https://visionagents.ai/examples/cartesia-narrator.md): Build a storytelling agent with expressive speech using Cartesia's Sonic 3 TTS - [Football Commentator](https://visionagents.ai/examples/football-commentator.md): Build a real-time AI sports commentator using object detection and realtime models - [Realtime Golf Coach](https://visionagents.ai/examples/golf-coach.md): Build a realtime golf coaching agent with Vision Agents - [Phone & RAG](https://visionagents.ai/examples/phone-and-rag.md): Build voice agents that answer phone calls with RAG-powered knowledge - [Security Camera](https://visionagents.ai/examples/security-camera.md): Build a security camera with face recognition, package detection, and theft alerts - [Simple Agent](https://visionagents.ai/examples/simple-agent.md): Learn how to build a simple agent with Vision Agents - [Realtime Visual Storyteller](https://visionagents.ai/examples/visual-storyteller.md): Build a realtime storytelling agent with dynamic video restyling using Vision Agents and Decart - [Phone Calling](https://visionagents.ai/guides/calling.md) - [Memory and Chat](https://visionagents.ai/guides/chat-and-memory.md) - [Production Deployment](https://visionagents.ai/guides/deployment.md) - [Event System](https://visionagents.ai/guides/event-system.md) - [Horizontal Scaling](https://visionagents.ai/guides/horizontal-scaling.md): Scale Vision Agents across multiple servers with Redis-backed session management - [Built-in HTTP Server](https://visionagents.ai/guides/http-server.md): Run agents as an HTTP server with session management, authentication, and real-time metrics - [Interruption Handling](https://visionagents.ai/guides/interruption-handling.md) - [Kubernetes Deployment](https://visionagents.ai/guides/kubernetes-deployment.md): Deploy Vision Agents to Kubernetes with Helm — step-by-step guide - [MCP and Function Calling](https://visionagents.ai/guides/mcp-tool-calling.md) - [Multiple Speakers](https://visionagents.ai/guides/multiple-speakers.md) - [Prometheus Metrics](https://visionagents.ai/guides/prometheus-metrics.md): Monitor your voice agents with real-time metrics - [RAG for Agents](https://visionagents.ai/guides/rag.md) - [Testing agents](https://visionagents.ai/guides/testing.md): Verify agent behavior with text-only tests using pytest - [Building Video Processors](https://visionagents.ai/guides/video-processors.md) - [AssemblyAI](https://visionagents.ai/integrations/assemblyai.md) - [AWS Bedrock](https://visionagents.ai/integrations/aws-bedrock.md) - [AWS Polly](https://visionagents.ai/integrations/aws-polly.md) - [Cartesia](https://visionagents.ai/integrations/cartesia.md) - [Create Your Own Plugin](https://visionagents.ai/integrations/create-your-own-plugin.md) - [Decart](https://visionagents.ai/integrations/decart.md) - [Deepgram](https://visionagents.ai/integrations/deepgram.md) - [ElevenLabs](https://visionagents.ai/integrations/elevenlabs.md) - [Fast-Whisper](https://visionagents.ai/integrations/fast-whisper.md) - [Fish Audio](https://visionagents.ai/integrations/fish.md) - [Gemini](https://visionagents.ai/integrations/gemini.md) - [HeyGen Avatars](https://visionagents.ai/integrations/heygen.md) - [HuggingFace](https://visionagents.ai/integrations/huggingface.md) - [Introduction to Integrations](https://visionagents.ai/integrations/introduction-to-integrations.md) - [Inworld](https://visionagents.ai/integrations/inworld.md) - [Kimi AI](https://visionagents.ai/integrations/kimi.md) - [Kokoro](https://visionagents.ai/integrations/kokoro.md) - [LemonSlice Avatars](https://visionagents.ai/integrations/lemonslice.md) - [Mistral Voxtral](https://visionagents.ai/integrations/mistral.md) - [Moondream](https://visionagents.ai/integrations/moondream.md) - [NVIDIA](https://visionagents.ai/integrations/nvidia.md) - [OpenAI](https://visionagents.ai/integrations/openai.md) - [OpenRouter](https://visionagents.ai/integrations/openrouter.md) - [Pocket TTS](https://visionagents.ai/integrations/pocket.md) - [Qwen](https://visionagents.ai/integrations/qwen.md) - [Roboflow](https://visionagents.ai/integrations/roboflow.md) - [Smart Turn](https://visionagents.ai/integrations/smart-turn.md) - [Ultralytics YOLO](https://visionagents.ai/integrations/ultralytics.md) - [Vogent](https://visionagents.ai/integrations/vogent.md) - [Wizper](https://visionagents.ai/integrations/wizper.md) - [xAI (Grok)](https://visionagents.ai/integrations/xai.md) - [Installation](https://visionagents.ai/introduction/installation.md) - [Overview](https://visionagents.ai/introduction/overview.md) - [Build a Video Agent](https://visionagents.ai/introduction/video-agents.md) - [Build a Voice Agent](https://visionagents.ai/introduction/voice-agents.md) - [Events Reference](https://visionagents.ai/reference/events-reference.md) ## Optional - [GitHub](https://github.com/GetStream/vision-agents) - [X Account](https://x.com/visionagents_ai) - [Discord](https://discord.gg/RkhX9PxMS6) Built with [Mintlify](https://mintlify.com).