# Vision Agents ## Docs - [Model Context Protocol (MCP)](https://visionagents.ai/ai-technologies/model-context-protocol.md) - [Speech To Speech (STS)](https://visionagents.ai/ai-technologies/speech-to-speech.md) - [Speech To Text (STT)](https://visionagents.ai/ai-technologies/speech-to-text.md) - [Text To Speech (TTS)](https://visionagents.ai/ai-technologies/text-to-speech.md) - [Turn Detection](https://visionagents.ai/ai-technologies/turn-detection.md) - [Agent Class](https://visionagents.ai/core/agent-core.md) - [LLM Class](https://visionagents.ai/core/llm-core.md) - [Overview](https://visionagents.ai/core/overview.md) - [Processors Class](https://visionagents.ai/core/processors-core.md) - [Realtime Class](https://visionagents.ai/core/realtime-core.md) - [Speech-to-Text and Text-to-Speech Class](https://visionagents.ai/core/stt-tts-core.md) - [Telemetry & Metrics](https://visionagents.ai/core/telemetry.md) - [Expressive Voice Narrator](https://visionagents.ai/examples/cartesia-narrator.md): Build a storytelling agent with expressive speech using Cartesia's Sonic 3 TTS - [Live Sports Commentator](https://visionagents.ai/examples/football-commentator.md): Build a real-time AI sports commentator using object detection and realtime models - [AI Golf Coach](https://visionagents.ai/examples/golf-coach.md): Build a real-time golf coaching agent with YOLO pose detection and voice feedback - [Phone Support Agent](https://visionagents.ai/examples/phone-and-rag.md): Build voice agents that answer phone calls with RAG-powered knowledge retrieval - [AI Meeting Copilot](https://visionagents.ai/examples/sales-assistant.md): Build a real-time sales assistant that listens to meetings and surfaces coaching suggestions - [Smart Security Camera](https://visionagents.ai/examples/security-camera.md): Build a security camera with face recognition, package detection, and automated theft alerts - [Voice Agent Starter](https://visionagents.ai/examples/simple-agent.md): Build a conversational voice AI agent that listens, thinks, and responds in real time - [Video Call Moderator](https://visionagents.ai/examples/video-moderator.md): Build a real-time video moderator that detects, censors, and escalates with verbal warnings - [Live Video Try-On](https://visionagents.ai/examples/visual-storyteller.md): Build a real-time virtual try-on agent with Decart's Lucy-2 model - [Phone Calling](https://visionagents.ai/guides/calling.md) - [Memory and Chat](https://visionagents.ai/guides/chat-and-memory.md) - [Overview](https://visionagents.ai/guides/deploying-overview.md): From local development to production Kubernetes cluster - [Docker Deployment](https://visionagents.ai/guides/deployment.md) - [Event System](https://visionagents.ai/guides/event-system.md) - [Horizontal Scaling](https://visionagents.ai/guides/horizontal-scaling.md): Scale Vision Agents across multiple servers with Redis-backed session management - [Built-in HTTP Server](https://visionagents.ai/guides/http-server.md): Run agents as an HTTP server with session management, authentication, and real-time metrics - [Interruption Handling](https://visionagents.ai/guides/interruption-handling.md) - [Kubernetes Deployment](https://visionagents.ai/guides/kubernetes-deployment.md): Deploy Vision Agents to Kubernetes with Helm — step-by-step guide - [MCP and Function Calling](https://visionagents.ai/guides/mcp-tool-calling.md) - [Multiple Speakers](https://visionagents.ai/guides/multiple-speakers.md) - [RAG for Agents](https://visionagents.ai/guides/rag.md) - [Testing agents](https://visionagents.ai/guides/testing.md): Verify agent behavior with text-only tests using pytest - [Building Video Processors](https://visionagents.ai/guides/video-processors.md) - [Anam Avatars](https://visionagents.ai/integrations/avatars/anam.md) - [HeyGen Avatars](https://visionagents.ai/integrations/avatars/heygen.md) - [LemonSlice Avatars](https://visionagents.ai/integrations/avatars/lemonslice.md) - [Create Your Own Plugin](https://visionagents.ai/integrations/create-your-own-plugin.md) - [Tencent RTC (Early Access)](https://visionagents.ai/integrations/edge-transport/tencent.md) - [Baseten](https://visionagents.ai/integrations/infrastructure/baseten.md) - [HuggingFace Inference](https://visionagents.ai/integrations/infrastructure/huggingface.md) - [TurboPuffer](https://visionagents.ai/integrations/infrastructure/turbopuffer.md) - [Introduction to Integrations](https://visionagents.ai/integrations/introduction-to-integrations.md) - [Gemini LLM](https://visionagents.ai/integrations/llm/gemini.md) - [HuggingFace Transformers](https://visionagents.ai/integrations/llm/huggingface-transformers.md) - [Kimi AI](https://visionagents.ai/integrations/llm/kimi.md) - [OpenAI LLM](https://visionagents.ai/integrations/llm/openai.md) - [OpenRouter](https://visionagents.ai/integrations/llm/openrouter.md) - [Qwen LLM](https://visionagents.ai/integrations/llm/qwen.md) - [Sarvam LLM](https://visionagents.ai/integrations/llm/sarvam.md) - [xAI (Grok)](https://visionagents.ai/integrations/llm/xai.md) - [AWS Bedrock](https://visionagents.ai/integrations/realtime/aws-bedrock.md) - [Gemini Realtime](https://visionagents.ai/integrations/realtime/gemini.md) - [Inworld Realtime](https://visionagents.ai/integrations/realtime/inworld.md) - [OpenAI Realtime](https://visionagents.ai/integrations/realtime/openai.md) - [Qwen Realtime](https://visionagents.ai/integrations/realtime/qwen.md) - [xAI Realtime](https://visionagents.ai/integrations/realtime/xai.md): Speech-to-speech using xAI's Grok models over WebSocket with built-in VAD. - [AssemblyAI](https://visionagents.ai/integrations/stt/assemblyai.md) - [Deepgram STT](https://visionagents.ai/integrations/stt/deepgram.md) - [ElevenLabs STT](https://visionagents.ai/integrations/stt/elevenlabs.md) - [Fast-Whisper](https://visionagents.ai/integrations/stt/fast-whisper.md) - [Fish Audio STT](https://visionagents.ai/integrations/stt/fish.md) - [Mistral Voxtral](https://visionagents.ai/integrations/stt/mistral.md) - [Sarvam STT](https://visionagents.ai/integrations/stt/sarvam.md) - [Wizper](https://visionagents.ai/integrations/stt/wizper.md) - [AWS Polly](https://visionagents.ai/integrations/tts/aws-polly.md) - [Cartesia](https://visionagents.ai/integrations/tts/cartesia.md) - [Deepgram TTS](https://visionagents.ai/integrations/tts/deepgram.md) - [ElevenLabs TTS](https://visionagents.ai/integrations/tts/elevenlabs.md) - [Fish Audio TTS](https://visionagents.ai/integrations/tts/fish.md) - [Inworld](https://visionagents.ai/integrations/tts/inworld.md) - [Kokoro](https://visionagents.ai/integrations/tts/kokoro.md) - [OpenAI TTS](https://visionagents.ai/integrations/tts/openai.md) - [Pocket TTS](https://visionagents.ai/integrations/tts/pocket.md) - [Sarvam TTS](https://visionagents.ai/integrations/tts/sarvam.md) - [xAI TTS](https://visionagents.ai/integrations/tts/xai.md): Text-to-speech using xAI's Grok voices with speech tag support. - [Smart Turn](https://visionagents.ai/integrations/turn-detection/smart-turn.md) - [Vogent](https://visionagents.ai/integrations/turn-detection/vogent.md) - [Decart](https://visionagents.ai/integrations/vision/decart.md) - [Moondream](https://visionagents.ai/integrations/vision/moondream.md) - [NVIDIA](https://visionagents.ai/integrations/vision/nvidia.md) - [Roboflow](https://visionagents.ai/integrations/vision/roboflow.md) - [Ultralytics YOLO](https://visionagents.ai/integrations/vision/ultralytics.md) - [Quickstart](https://visionagents.ai/introduction/quickstart.md): Build and run your first AI voice agent in under 5 minutes - [Video Agents](https://visionagents.ai/introduction/video-agents.md): Build video AI agents with realtime models, VLMs, and computer vision processors - [Voice Agents](https://visionagents.ai/introduction/voice-agents.md): Build voice agents with realtime models or custom STT/LLM/TTS pipelines - [Events Reference](https://visionagents.ai/reference/events-reference.md) ## Optional - [GitHub](https://github.com/GetStream/vision-agents) - [X Account](https://x.com/visionagents_ai) - [Discord](https://discord.gg/RkhX9PxMS6) - [Contact Us](mailto:nash@getstream.io) - [Explore Demo](https://demo.visionagents.ai)