> ## Documentation Index > Fetch the complete documentation index at: https://visionagents.ai/llms.txt > Use this file to discover all available pages before exploring further. # Voice Agent Starter > Build a conversational voice AI agent that listens, thinks, and responds in real time Check out the complete Simple Agent example in our GitHub repository In this example, we build a conversational voice AI agent using [OpenAI](https://openai.com/) for language understanding, [ElevenLabs](https://elevenlabs.io/) for natural-sounding speech, and [Deepgram](https://deepgram.com/) for speech recognition. The agent joins a video call, greets the user, handles voice conversation, and can observe the camera feed. This is the best starting point for developers new to Vision Agents. Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account for real-time transport. Most providers offer free tiers to get started. ## What You Will Build * Listen to user speech and convert it to text with [Deepgram](https://deepgram.com/) STT * Process conversations using [OpenAI](https://openai.com/) GPT-4o-mini * Respond with natural-sounding speech via [ElevenLabs](https://elevenlabs.io/) TTS * Detect when the user has finished speaking with [Smart Turn](https://fal.ai/models/fal-ai/smart-turn) detection * Run on [Stream's](https://getstream.io/) low-latency edge network ## Next Steps Add video processing with YOLO pose detection Swap in any of 25+ supported AI providers