Skip to main content
Interruption handling ensures your voice agent responds naturally when users speak over the agent mid-response. Vision Agents handles interruptions automatically.
Vision Agents requires a Stream account for real-time transport.

Automatic Handling

When you configure turn detection, the Agent class automatically:
  1. Detects when the user starts speaking (via TurnStartedEvent)
  2. Stops the TTS audio output immediately
  3. Flushes the audio track to clear buffered audio
  4. Listens to the user’s new input
  5. Responds appropriately
No custom event handlers required for basic interruption handling.

Realtime APIs

If you’re using OpenAI Realtime, Gemini Live, AWS Bedrock, or Qwen, interruption handling is built-in at the model level. No turn detection plugin needed.
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.Realtime(),  # Built-in interruption handling
)
Realtime APIs are recommended for the most natural conversation flow with minimal latency.

Traditional Pipeline Setup

For the STT → LLM → TTS pipeline, add a turn detection plugin:
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=smart_turn.TurnDetection(),
)

Custom Behavior

Add custom logic for logging, analytics, or special responses:
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent

@agent.events.subscribe
async def on_interruption(event: TurnStartedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"User {event.participant.user_id} interrupted the agent")
        metrics.increment("user_interruptions")

@agent.events.subscribe
async def on_turn_complete(event: TurnEndedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"User finished speaking (duration: {event.duration_ms}ms)")
You don’t need to call tts.stop_audio() in your handlers — the Agent class does this automatically.

Tuning Sensitivity

Adjust turn detection parameters to control interruption response:

More Sensitive (Faster Response)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=0.5,
    confidence_threshold=0.3
)
Use when: You want immediate response to any sound. Trade-off: May trigger on background noise.

Less Sensitive (More Deliberate)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=2.0,
    confidence_threshold=0.7
)
Use when: You want to avoid false positives. Trade-off: Slower to respond to genuine interruptions.
turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=1.5,
    confidence_threshold=0.5
)

Best Practices

Keep responses concise — Shorter responses mean fewer interruptions:
instructions="Keep responses under 2-3 sentences. Be concise."
Acknowledge interruptions — Add context in your instructions:
instructions="If interrupted, briefly acknowledge and address the user's new question."
Filter agent events — In custom handlers, always check the source:
if event.participant and event.participant.user_id == "agent":
    return

Troubleshooting

IssueSolution
Agent doesn’t stop when interruptedVerify turn_detection is configured; lower confidence_threshold
Agent stops too easily (false positives)Increase confidence_threshold to 0.7; increase buffer_in_seconds to 2.0
Delay before responding to interruptionDecrease buffer_in_seconds to 0.5; consider Realtime API
Not working at allDon’t use turn detection with Realtime LLMs (they handle it internally)

Next Steps