Vogent uses neural models to predict when a speaker has completed their conversational turn. Provides intelligent turn-taking for natural conversation flow.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[vogent]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import vogent, gemini, deepgram, elevenlabs, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
turn_detection=vogent.TurnDetection(),
)
Models download automatically on first use.
Parameters
| Name | Type | Default | Description |
|---|
buffer_in_seconds | float | 2.0 | Audio buffer duration |
confidence_threshold | float | 0.5 | Turn completion threshold (0-1) |
sample_rate | int | 16000 | Audio sample rate |
Events
from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent
@turn_detection.events.subscribe
async def on_turn_ended(event: TurnEndedEvent):
print(f"User finished: confidence={event.confidence}")
Next Steps