Vogent

Vogent uses neural models to predict when a speaker has completed their conversational turn. Provides intelligent turn-taking for natural conversation flow.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[vogent]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import vogent, gemini, deepgram, elevenlabs, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=vogent.TurnDetection(),
)

Models download automatically on first use.

Parameters

Name	Type	Default	Description
`buffer_in_seconds`	`float`	`2.0`	Audio buffer duration
`confidence_threshold`	`float`	`0.5`	Turn completion threshold (0-1)
`sample_rate`	`int`	`16000`	Audio sample rate

Events

from vision_agents.core.turn_detection.events import TurnStartedEvent, TurnEndedEvent

@turn_detection.events.subscribe
async def on_turn_ended(event: TurnEndedEvent):
    print(f"User finished: confidence={event.confidence}")

Overview

AI Providers

Custom Integrations

Installation

Quick Start

Parameters

Events

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick Start

​Parameters

​Events

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Quick Start

Parameters

Events

Next Steps