View Simple Agent Example on GitHub
Check out the complete Simple Agent example in our GitHub repository
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
What You Will Build
- Listen to user speech and convert it to text with Deepgram STT
- Process conversations using OpenAI GPT-4o-mini
- Respond with natural-sounding speech via ElevenLabs TTS
- Detect when the user has finished speaking with Smart Turn detection
- Run on Stream’s low-latency edge network
Next Steps
AI Golf Coach
Add video processing with YOLO pose detection
Integrations
Swap in any of 25+ supported AI providers

