Mistral Voxtral

Mistral Voxtral provides real-time speech-to-text via WebSocket streaming with automatic language detection and low-latency transcription.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[mistral]"

Quick start

from vision_agents.core import Agent, User
from vision_agents.plugins import mistral, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=mistral.STT(),
    tts=deepgram.TTS(),
)

Set MISTRAL_API_KEY in your environment or pass api_key directly.

Parameters

Name	Type	Default	Description
`api_key`	`str`	`None`	API key (defaults to `MISTRAL_API_KEY` env var)
`model`	`str`	`"voxtral-mini-transcribe-realtime-2602"`	Model identifier
`sample_rate`	`int`	`16000`	Audio sample rate in Hz (8000, 16000, 22050, 44100, 48000)

Turn detection

Mistral Voxtral STT does not include built-in turn detection. Pair it with an external turn detection plugin like Smart Turn or Vogent.

from vision_agents.plugins import mistral, smart_turn

agent = Agent(
    stt=mistral.STT(),
    turn_detection=smart_turn.TurnDetection(),
    # ... other config
)

Overview

AI Providers

Custom Integrations

Installation

Quick start

Parameters

Turn detection

Next steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick start

​Parameters

​Turn detection

​Next steps

Build a Voice Agent

Build a Video Agent

Installation

Quick start

Parameters

Turn detection

Next steps