Skip to main content
Interruption handling ensures your voice agent responds naturally when users speak over the agent mid-response. Vision Agents handles interruptions automatically.
Vision Agents requires a Stream account for real-time transport.

Automatic Handling

When you configure turn detection, the Agent class automatically:
  1. Detects when the user starts speaking (via TurnStartedEvent)
  2. Interrupts the active TTS or Realtime LLM, incrementing its epoch counter
  3. Discards stale audio — any audio events from before the interruption are dropped based on epoch matching
  4. Flushes the audio track to clear buffered audio
  5. Listens to the user’s new input
  6. Responds appropriately
No custom event handlers required for basic interruption handling.

Epoch-based stale event tracking

Both TTS and Realtime LLMs maintain a monotonic epoch counter. When interrupt() is called, the epoch increments. Each audio event (TTSAudioEvent, RealtimeAudioOutputEvent) carries the epoch at which it was produced. The Agent automatically compares the event’s epoch against the current component epoch and drops any events that don’t match, preventing stale audio from playing after an interruption.

Realtime APIs

If you’re using OpenAI Realtime, Gemini Live, AWS Bedrock, or Qwen, interruption handling is built-in at the model level. No turn detection plugin needed.
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.Realtime(),  # Built-in interruption handling
)
Realtime APIs are recommended for the most natural conversation flow with minimal latency.

Traditional Pipeline Setup

For the STT → LLM → TTS pipeline, you need turn detection. Some STT plugins include it automatically:
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, elevenlabs

# ElevenLabs STT has built-in VAD turn detection — no extra plugin needed
agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=elevenlabs.STT(),
    tts=elevenlabs.TTS(),
)
If your STT plugin does not include turn detection, add a separate plugin:
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=smart_turn.TurnDetection(),
)
If you provide both an STT with built-in turn detection and a separate turn_detection plugin, the Agent automatically ignores the external plugin to prevent conflicts.

Custom Behavior

Add custom logic for logging, analytics, or special responses:
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent

@agent.events.subscribe
async def on_interruption(event: TurnStartedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"User {event.participant.user_id} interrupted the agent")
        metrics.increment("user_interruptions")

@agent.events.subscribe
async def on_turn_complete(event: TurnEndedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"User finished speaking (duration: {event.duration_ms}ms)")
You don’t need to call tts.interrupt() or llm.interrupt() in your handlers — the Agent class does this automatically for both TTS and Realtime LLM pipelines.

Tuning Sensitivity

Adjust turn detection parameters to control interruption response:

More Sensitive (Faster Response)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=0.5,
    confidence_threshold=0.3
)
Use when: You want immediate response to any sound. Trade-off: May trigger on background noise.

Less Sensitive (More Deliberate)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=2.0,
    confidence_threshold=0.7
)
Use when: You want to avoid false positives. Trade-off: Slower to respond to genuine interruptions.
turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=1.5,
    confidence_threshold=0.5
)

Best Practices

Keep responses concise — Shorter responses mean fewer interruptions:
instructions="Keep responses under 2-3 sentences. Be concise."
Acknowledge interruptions — Add context in your instructions:
instructions="If interrupted, briefly acknowledge and address the user's new question."
Filter agent events — In custom handlers, always check the source:
if event.participant and event.participant.user_id == "agent":
    return

Troubleshooting

IssueSolution
Agent doesn’t stop when interruptedVerify turn_detection is configured; lower confidence_threshold
Agent stops too easily (false positives)Increase confidence_threshold to 0.7; increase buffer_in_seconds to 2.0
Delay before responding to interruptionDecrease buffer_in_seconds to 0.5; consider Realtime API
Not working at allDon’t use turn detection with Realtime LLMs (they handle it internally)

Next Steps

Turn Detection

VAD vs Turn Detection concepts

Smart Turn

Full parameter reference