Smart Turn uses neural models (Silero VAD + Whisper features) to detect when a speaker has completed their turn. Provides natural conversation flow without relying solely on silence detection.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
uv add "vision-agents[smart_turn]"
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import smart_turn, gemini, deepgram, elevenlabs, getstream
agent = Agent(
edge = getstream.Edge(),
agent_user = User( name = "Assistant" , id = "agent" ),
instructions = "You are a helpful assistant." ,
llm = gemini.LLM( "gemini-3-flash-preview" ),
stt = deepgram.STT(),
tts = elevenlabs.TTS(),
turn_detection = smart_turn.TurnDetection(),
)
Models download automatically on first use.
Parameters
Name Type Default Description buffer_in_secondsfloat2.0Audio buffer duration confidence_thresholdfloat0.5Turn completion threshold (0-1) sample_rateint16000Audio sample rate
Events
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent
@turn_detection.events.subscribe
async def on_turn_ended ( event : TurnEndedEvent):
print ( f "User finished: confidence= { event.confidence } " )
Next Steps
Build a Voice Agent Get started with voice
Build a Video Agent Add video processing