Interruption Handling - Vision Agents

Interruption handling ensures your voice agent responds naturally when users speak over the agent mid-response. This guide explains how Vision Agents handles interruptions automatically and how to customize the behavior if needed.

Automatic Interruption Handling

Good news: Vision Agents handles interruptions automatically. When you configure turn detection, the Agent class will:

Detect when the user starts speaking (via TurnStartedEvent)
Stop the TTS audio output immediately
Flush the audio track to clear any buffered audio
Listen to the user’s new input
Respond appropriately

You don’t need to write any custom event handlers for basic interruption handling.

How It Works Under the Hood

The Agent class has a built-in _on_turn_event handler that manages interruptions:

# This happens automatically inside the Agent class
async def _on_turn_event(self, event: TurnStartedEvent | TurnEndedEvent):
    if isinstance(event, TurnStartedEvent):
        # When user starts speaking, interrupt the agent
        if event.participant.user_id != self.agent_user.id:
            if self.tts:
                await self.tts.stop_audio()  # Stop TTS
            if self._audio_track:
                await self._audio_track.flush()  # Clear audio buffer

This means when a user speaks over the agent, the audio stops immediately without any additional code.

Realtime APIs

If you’re using OpenAI Realtime, Gemini Live, AWS Bedrock, or Qwen, interruption handling is built-in at the model level. These APIs handle the full audio pipeline including automatic interruption detection—no turn detection plugin needed.

from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.Realtime(),  # Built-in interruption handling
)

Realtime APIs are recommended for the most natural conversation flow. They handle interruptions at the model level with minimal latency.

Traditional Pipeline Setup

For the traditional STT → LLM → TTS pipeline, simply add a turn detection plugin and interruption handling works automatically:

from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=smart_turn.TurnDetection(),  # Enables automatic interruption handling
)

That’s it! The Agent class handles the rest.

Adding Custom Behavior

While interruptions are handled automatically, you may want to add custom logic—like logging, analytics, or special responses. Subscribe to turn events:

from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent

@agent.events.subscribe
async def on_interruption(event: TurnStartedEvent):
    if event.participant and event.participant.user_id != "agent":
        # Custom logic when user interrupts
        logger.info(f"🎤 User {event.participant.user_id} interrupted the agent")
        
        # Example: Track interruption metrics
        metrics.increment("user_interruptions")

@agent.events.subscribe
async def on_turn_complete(event: TurnEndedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"✅ User finished speaking (duration: {event.duration_ms}ms)")

You don’t need to call tts.stop_audio() in your event handlers—the Agent class already does this automatically.

Complete Example

Here’s a complete example with turn detection enabled:

import logging
from dotenv import load_dotenv

from vision_agents.core import Agent, User, cli
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

logger = logging.getLogger(__name__)
load_dotenv()


async def create_agent(**kwargs) -> Agent:
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Keep responses concise.",
        llm=openai.LLM(model="gpt-4o-mini"),
        stt=deepgram.STT(),
        tts=elevenlabs.TTS(),
        turn_detection=smart_turn.TurnDetection(
            buffer_in_seconds=1.5,
            confidence_threshold=0.5
        ),
    )
    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    await agent.create_user()
    call = await agent.create_call(call_type, call_id)

    with await agent.join(call):
        await agent.edge.open_demo(call)
        await agent.llm.simple_response("Hello! Feel free to interrupt me at any time.")
        await agent.finish()


if __name__ == "__main__":
    cli(AgentLauncher(create_agent=create_agent, join_call=join_call))

Tuning Interruption Sensitivity

Adjust turn detection parameters to control how quickly the agent responds to interruptions:

More Sensitive (Faster Response)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=0.5,   # Less audio needed before detection
    confidence_threshold=0.3  # Lower threshold = more sensitive
)

Use when: You want the agent to stop immediately when users make any sound. Trade-off: May trigger on background noise or false positives.

Less Sensitive (More Deliberate)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=2.0,   # More audio needed before detection
    confidence_threshold=0.7  # Higher threshold = requires clearer speech
)

Use when: You want to avoid false positives from background noise. Trade-off: Slower to respond to genuine interruptions.

Recommended Defaults

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=1.5,
    confidence_threshold=0.5
)

This provides a good balance for most conversational agents.

Best Practices

1. Keep agent responses concise Shorter responses mean fewer interruptions needed. Instruct your agent:

instructions="Keep responses under 2-3 sentences. Be concise."

2. Acknowledge interruptions gracefully Consider adding context in your instructions:

instructions="If interrupted, briefly acknowledge and address the user's new question."

3. Test with real conversations Tune buffer_in_seconds and confidence_threshold based on actual usage patterns. 4. Filter agent events in custom handlers If you add custom event handlers, always filter out the agent’s own events:

if event.participant and event.participant.user_id == "agent":
    return  # Don't react to agent's own audio

5. Consider Realtime APIs for lowest latency If natural interruption handling is crucial, Realtime APIs (OpenAI, Gemini, AWS Bedrock, Qwen) provide the best experience with built-in handling at the model level.

Troubleshooting

Agent doesn’t stop when I interrupt

Verify turn_detection is configured in your Agent
Lower confidence_threshold for more sensitivity
Check that your microphone is working and audio is being received

Agent stops too easily (false positives)

Increase confidence_threshold (e.g., 0.7)
Increase buffer_in_seconds (e.g., 2.0)
Check for background noise in your audio setup

Delay before agent responds to interruption

Decrease buffer_in_seconds (e.g., 0.5)
Consider switching to a Realtime API for lower latency

Interruption handling not working at all

Ensure you’re not using a Realtime LLM with a separate turn detection plugin (Realtime APIs handle this internally)
Check logs for 👉 Turn started messages to verify events are firing

Next Steps

Turn Detection — Understand the concepts behind turn detection (VAD vs Turn Detection)
Smart Turn — Full parameter reference for Smart Turn
Vogent — Full parameter reference for Vogent
Event System — Learn more about subscribing to events

How-to Guides

​Automatic Interruption Handling

​How It Works Under the Hood

​Realtime APIs

​Traditional Pipeline Setup

​Adding Custom Behavior

​Complete Example

​Tuning Interruption Sensitivity

​More Sensitive (Faster Response)

​Less Sensitive (More Deliberate)

​Recommended Defaults

​Best Practices

​Troubleshooting

​Agent doesn’t stop when I interrupt

​Agent stops too easily (false positives)

​Delay before agent responds to interruption

​Interruption handling not working at all

​Next Steps

Automatic Interruption Handling

How It Works Under the Hood

Realtime APIs

Traditional Pipeline Setup

Adding Custom Behavior

Complete Example

Tuning Interruption Sensitivity

More Sensitive (Faster Response)

Less Sensitive (More Deliberate)

Recommended Defaults

Best Practices

Troubleshooting

Agent doesn’t stop when I interrupt

Agent stops too easily (false positives)

Delay before agent responds to interruption

Interruption handling not working at all

Next Steps