When to Use Realtime
Use a Realtime LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required. Use the traditional STT → LLM → TTS pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don’t support realtime audio.Supported Providers
- OpenAI Realtime — WebRTC-based, supports video
- Gemini Live — WebSocket-based, supports video
- AWS Nova — WebSocket-based
- Qwen Omni — Native audio support
Basic Usage
Methods
simple_response(text, processors=None, participant=None)
Sends a text prompt to the realtime model. The model responds with audio.
simple_audio_response(pcm, participant=None)
Sends raw PCM audio data directly to the model for processing.
Properties
| Property | Type | Description |
|---|---|---|
connected | bool | True if the realtime session is active |
fps | int | Video frames per second sent to the model (default: 1) |
session_id | str | UUID identifying the current session |
Events
The Realtime class emits events for monitoring conversations:| Event | Description |
|---|---|
RealtimeConnectedEvent | Connection established |
RealtimeDisconnectedEvent | Connection closed |
RealtimeUserSpeechTranscriptionEvent | Transcript of user speech |
RealtimeAgentSpeechTranscriptionEvent | Transcript of agent speech |
RealtimeResponseEvent | AI response text |
RealtimeAudioInputEvent | Audio received from user |
RealtimeAudioOutputEvent | Audio sent to user |
RealtimeErrorEvent | Error during processing |
For provider-specific parameters and configuration, see the integration docs for OpenAI, Gemini, AWS Bedrock, or Qwen.

