> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Agent Starter

> Build a conversational voice AI agent that listens, thinks, and responds in real time

<Card title="View Simple Agent Example on GitHub" icon="github" href="https://github.com/GetStream/Vision-Agents/tree/main/examples/01_simple_agent_example">
  Check out the complete Simple Agent example in our GitHub repository
</Card>

In this example, we build a conversational voice AI agent using [OpenAI](https://openai.com/) for language understanding, [ElevenLabs](https://elevenlabs.io/) for natural-sounding speech, and [Deepgram](https://deepgram.com/) for speech recognition. The agent joins a video call, greets the user, handles voice conversation, and can observe the camera feed. This is the best starting point for developers new to Vision Agents.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

## What You Will Build

* Listen to user speech and convert it to text with [Deepgram](https://deepgram.com/) STT
* Process conversations using [OpenAI](https://openai.com/) GPT-4o-mini
* Respond with natural-sounding speech via [ElevenLabs](https://elevenlabs.io/) TTS
* Detect when the user has finished speaking with [Smart Turn](https://fal.ai/models/fal-ai/smart-turn) detection
* Run on [Stream's](https://getstream.io/) low-latency edge network

## Next Steps

<CardGroup cols={2}>
  <Card title="AI Golf Coach" icon="golf-ball-tee" href="/examples/golf-coach">
    Add video processing with YOLO pose detection
  </Card>

  <Card title="Integrations" icon="plug" href="/integrations/introduction-to-integrations">
    Swap in any of 25+ supported AI providers
  </Card>
</CardGroup>
