Skip to main content
Interruption handling ensures your voice agent responds naturally when users speak over the agent mid-response. This guide explains how Vision Agents handles interruptions automatically and how to customize the behavior if needed.

Automatic Interruption Handling

Good news: Vision Agents handles interruptions automatically. When you configure turn detection, the Agent class will:
  1. Detect when the user starts speaking (via TurnStartedEvent)
  2. Stop the TTS audio output immediately
  3. Flush the audio track to clear any buffered audio
  4. Listen to the user’s new input
  5. Respond appropriately
You don’t need to write any custom event handlers for basic interruption handling.

How It Works Under the Hood

The Agent class has a built-in _on_turn_event handler that manages interruptions:
# This happens automatically inside the Agent class
async def _on_turn_event(self, event: TurnStartedEvent | TurnEndedEvent):
    if isinstance(event, TurnStartedEvent):
        # When user starts speaking, interrupt the agent
        if event.participant.user_id != self.agent_user.id:
            if self.tts:
                await self.tts.stop_audio()  # Stop TTS
            if self._audio_track:
                await self._audio_track.flush()  # Clear audio buffer
This means when a user speaks over the agent, the audio stops immediately without any additional code.

Realtime APIs

If you’re using OpenAI Realtime, Gemini Live, AWS Bedrock, or Qwen, interruption handling is built-in at the model level. These APIs handle the full audio pipeline including automatic interruption detection—no turn detection plugin needed.
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.Realtime(),  # Built-in interruption handling
)
Realtime APIs are recommended for the most natural conversation flow. They handle interruptions at the model level with minimal latency.

Traditional Pipeline Setup

For the traditional STT → LLM → TTS pipeline, simply add a turn detection plugin and interruption handling works automatically:
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a helpful voice assistant.",
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=smart_turn.TurnDetection(),  # Enables automatic interruption handling
)
That’s it! The Agent class handles the rest.

Adding Custom Behavior

While interruptions are handled automatically, you may want to add custom logic—like logging, analytics, or special responses. Subscribe to turn events:
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent

@agent.events.subscribe
async def on_interruption(event: TurnStartedEvent):
    if event.participant and event.participant.user_id != "agent":
        # Custom logic when user interrupts
        logger.info(f"🎤 User {event.participant.user_id} interrupted the agent")
        
        # Example: Track interruption metrics
        metrics.increment("user_interruptions")

@agent.events.subscribe
async def on_turn_complete(event: TurnEndedEvent):
    if event.participant and event.participant.user_id != "agent":
        logger.info(f"✅ User finished speaking (duration: {event.duration_ms}ms)")
You don’t need to call tts.stop_audio() in your event handlers—the Agent class already does this automatically.

Complete Example

Here’s a complete example with turn detection enabled:
import logging
from dotenv import load_dotenv

from vision_agents.core import Agent, User, cli
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import openai, getstream, deepgram, elevenlabs, smart_turn

logger = logging.getLogger(__name__)
load_dotenv()


async def create_agent(**kwargs) -> Agent:
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Keep responses concise.",
        llm=openai.LLM(model="gpt-4o-mini"),
        stt=deepgram.STT(),
        tts=elevenlabs.TTS(),
        turn_detection=smart_turn.TurnDetection(
            buffer_in_seconds=1.5,
            confidence_threshold=0.5
        ),
    )
    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    await agent.create_user()
    call = await agent.create_call(call_type, call_id)

    with await agent.join(call):
        await agent.edge.open_demo(call)
        await agent.llm.simple_response("Hello! Feel free to interrupt me at any time.")
        await agent.finish()


if __name__ == "__main__":
    cli(AgentLauncher(create_agent=create_agent, join_call=join_call))

Tuning Interruption Sensitivity

Adjust turn detection parameters to control how quickly the agent responds to interruptions:

More Sensitive (Faster Response)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=0.5,   # Less audio needed before detection
    confidence_threshold=0.3  # Lower threshold = more sensitive
)
Use when: You want the agent to stop immediately when users make any sound. Trade-off: May trigger on background noise or false positives.

Less Sensitive (More Deliberate)

turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=2.0,   # More audio needed before detection
    confidence_threshold=0.7  # Higher threshold = requires clearer speech
)
Use when: You want to avoid false positives from background noise. Trade-off: Slower to respond to genuine interruptions.
turn_detection = smart_turn.TurnDetection(
    buffer_in_seconds=1.5,
    confidence_threshold=0.5
)
This provides a good balance for most conversational agents.

Best Practices

1. Keep agent responses concise Shorter responses mean fewer interruptions needed. Instruct your agent:
instructions="Keep responses under 2-3 sentences. Be concise."
2. Acknowledge interruptions gracefully Consider adding context in your instructions:
instructions="If interrupted, briefly acknowledge and address the user's new question."
3. Test with real conversations Tune buffer_in_seconds and confidence_threshold based on actual usage patterns. 4. Filter agent events in custom handlers If you add custom event handlers, always filter out the agent’s own events:
if event.participant and event.participant.user_id == "agent":
    return  # Don't react to agent's own audio
5. Consider Realtime APIs for lowest latency If natural interruption handling is crucial, Realtime APIs (OpenAI, Gemini, AWS Bedrock, Qwen) provide the best experience with built-in handling at the model level.

Troubleshooting

Agent doesn’t stop when I interrupt

  • Verify turn_detection is configured in your Agent
  • Lower confidence_threshold for more sensitivity
  • Check that your microphone is working and audio is being received

Agent stops too easily (false positives)

  • Increase confidence_threshold (e.g., 0.7)
  • Increase buffer_in_seconds (e.g., 2.0)
  • Check for background noise in your audio setup

Delay before agent responds to interruption

  • Decrease buffer_in_seconds (e.g., 0.5)
  • Consider switching to a Realtime API for lower latency

Interruption handling not working at all

  • Ensure you’re not using a Realtime LLM with a separate turn detection plugin (Realtime APIs handle this internally)
  • Check logs for 👉 Turn started messages to verify events are firing

Next Steps

  • Turn Detection — Understand the concepts behind turn detection (VAD vs Turn Detection)
  • Smart Turn — Full parameter reference for Smart Turn
  • Vogent — Full parameter reference for Vogent
  • Event System — Learn more about subscribing to events