How does speech to speech work?
- Listen - Convert speech to text
- Understand - Process the meaning
- Think - Generate a response
- Speak - Convert text back to speech
- Respond - Talk back to you

How does it work with Stream?
The Vision Agents SDK simplifies this entire process by providing a unified system that handles the conversation flow seamlessly within your calls. Instead of building complex pipelines that connect multiple services, you get everything you need in one integrated solution. Here’s how it works in your Stream calls:- Choose Your AI: Pick an AI model for intelligent, context-aware conversations.
- Configure Personality: Set up how your AI should behave.
- Start Conversations: Users can simply start talking, and your AI will listen, process, and respond naturally through the call.
- Real-time Interaction: The entire conversation happens in real-time, with minimal delay between what users say and how the AI responds.
- Seamless Integration: Everything works within your existing Stream call—no separate audio channels or complex routing needed.

Worked example
Let’s walk through a real-world scenario to see how Realtime creates magical conversational experiences. Imagine you’re building a virtual meeting assistant that helps teams stay organized and productive. Here’s how Realtime makes this possible: The Scenario: A team meeting where the AI assistant helps manage the agenda and take notes. What Happens:- The meeting starts, and someone says “Hey assistant, can you help us stay on track today?”
- The AI responds naturally: “Of course! I’m here to help. I can take notes, track action items, and keep us on schedule. What’s on the agenda today?”
- A team member says “We need to discuss the Q3 budget and plan the product launch.”
- The AI processes this and responds: “Great! I’ll create agenda items for budget discussion and product launch planning. I’ll also track any decisions and action items we make. Should we start with the budget?”
- Throughout the meeting, the AI can interject with helpful reminders: “We have 10 minutes left for the budget discussion. Should we move to the product launch planning?”