Installation
Install the Vogent plugin withExample
Initialisation
The Vogent plugin is exposed via theTurnDetection class:
Parameters
You can customise the behaviour of Vogent through the following parameters:| Name | Type | Default | Description |
|---|---|---|---|
buffer_in_seconds | float | 2.0 | Duration in seconds to buffer audio before processing. |
confidence_threshold | float | 0.5 | Probability threshold (0.0–1.0) for determining turn completion. |
sample_rate | int | 16000 | Audio sample rate in Hz for processing (audio is resampled automatically). |
Functionality
Start and Stop
Control turn detection with thestart() and stop() methods:
Events
The plugin emits turn detection events through the Vision Agents event system:Turn Started Event
Fired when a user begins speaking:Turn Ended Event
Fired when a user completes their turn (based on the model’s prediction and confidence threshold):Event Properties
BothTurnStartedEvent and TurnEndedEvent include the following properties:
| Property | Type | Description |
|---|---|---|
participant | Participant | Participant object with user_id and metadata. |
confidence | float|None | Confidence level of the turn detection (0.0–1.0). |
trailing_silence_ms | float|None | Milliseconds of silence after speech (TurnEnded). |
duration_ms | float|None | Duration of the turn in milliseconds (TurnEnded). |
custom | dict|None | Additional model-specific data. |
How It Works
Vogent uses a neural model to analyze audio and predict turn completion. The system:- Buffers incoming audio based on
buffer_in_seconds - Processes audio through the Vogent neural model
- Predicts turn completion probability
- Emits
TurnStartedEventwhen speech begins - Emits
TurnEndedEventwhen turn completion probability exceedsconfidence_threshold
Model Downloads
On first run, the model downloads the following:- Silero VAD: Voice activity detection model
- Whisper Feature Extractor: Semantic feature extraction
start() call may take a few seconds while models are downloaded.
