Interruption handling ensures your voice agent responds naturally when users speak over the agent mid-response. Vision Agents handles interruptions automatically.
Vision Agents requires a Stream account for real-time transport.
Automatic Handling
When you configure turn detection, the Agent class automatically:
- Detects when the user starts speaking (via
TurnStartedEvent)
- Stops the TTS audio output immediately
- Flushes the audio track to clear buffered audio
- Listens to the user’s new input
- Responds appropriately
No custom event handlers required for basic interruption handling.
Realtime APIs
If you’re using OpenAI Realtime, Gemini Live, AWS Bedrock, or Qwen, interruption handling is built-in at the model level. No turn detection plugin needed.
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You're a helpful voice assistant.",
llm=openai.Realtime(), # Built-in interruption handling
)
Realtime APIs are recommended for the most natural conversation flow with minimal latency.
Traditional Pipeline Setup
For the STT → LLM → TTS pipeline, add a turn detection plugin:
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You're a helpful voice assistant.",
llm=openai.LLM(model="gpt-4o-mini"),
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
turn_detection=smart_turn.TurnDetection(),
)
Custom Behavior
Add custom logic for logging, analytics, or special responses:
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent
@agent.events.subscribe
async def on_interruption(event: TurnStartedEvent):
if event.participant and event.participant.user_id != "agent":
logger.info(f"User {event.participant.user_id} interrupted the agent")
metrics.increment("user_interruptions")
@agent.events.subscribe
async def on_turn_complete(event: TurnEndedEvent):
if event.participant and event.participant.user_id != "agent":
logger.info(f"User finished speaking (duration: {event.duration_ms}ms)")
You don’t need to call tts.stop_audio() in your handlers — the Agent class does this automatically.
Tuning Sensitivity
Adjust turn detection parameters to control interruption response:
More Sensitive (Faster Response)
turn_detection = smart_turn.TurnDetection(
buffer_in_seconds=0.5,
confidence_threshold=0.3
)
Use when: You want immediate response to any sound.
Trade-off: May trigger on background noise.
Less Sensitive (More Deliberate)
turn_detection = smart_turn.TurnDetection(
buffer_in_seconds=2.0,
confidence_threshold=0.7
)
Use when: You want to avoid false positives.
Trade-off: Slower to respond to genuine interruptions.
Recommended Defaults
turn_detection = smart_turn.TurnDetection(
buffer_in_seconds=1.5,
confidence_threshold=0.5
)
Best Practices
Keep responses concise — Shorter responses mean fewer interruptions:
instructions="Keep responses under 2-3 sentences. Be concise."
Acknowledge interruptions — Add context in your instructions:
instructions="If interrupted, briefly acknowledge and address the user's new question."
Filter agent events — In custom handlers, always check the source:
if event.participant and event.participant.user_id == "agent":
return
Troubleshooting
| Issue | Solution |
|---|
| Agent doesn’t stop when interrupted | Verify turn_detection is configured; lower confidence_threshold |
| Agent stops too easily (false positives) | Increase confidence_threshold to 0.7; increase buffer_in_seconds to 2.0 |
| Delay before responding to interruption | Decrease buffer_in_seconds to 0.5; consider Realtime API |
| Not working at all | Don’t use turn detection with Realtime LLMs (they handle it internally) |
Next Steps