Realtime Class
The Realtime component provides end-to-end speech-to-speech communication, combining STT, LLM, and TTS functionality in a single, optimized interface. It delivers ultra-low latency speech processing, direct audio streaming without intermediate text conversion, provider-specific optimizations, and support for multiple modalities (audio, video, text).Overview
The Realtime class is an abstract base class that enables real-time AI communication through various providers. It eliminates the need for separate STT and TTS services by handling speech-to-speech communication directly, resulting in lower latency and more natural conversations.Supported Providers
- OpenAI Realtime API: WebRTC-based real-time communication with GPT models
- Google Gemini Live: Native audio processing with multimodal capabilities
Basic Usage
Abstract Base Class
Core Methods
async connect()
Establishes connection to the realtime provider. Must be implemented by each provider.
async simple_audio_response(pcm: PcmData)
Sends audio data to the realtime provider for processing.
async simple_response(text: str, processors=None, participant=None)
Sends a text message to the realtime provider.
async close()
Closes the realtime connection and cleans up resources.
Properties
is_connected: bool
Returns True
if the realtime session is currently active.
output_track: AudioStreamTrack
WebRTC audio track for outputting synthesized speech.
fps: int
Frames per second for video processing (default: 1).
Provider Implementations
OpenAI Realtime
Gemini Live
Event System
The Realtime class emits various events for monitoring and integration:Connection Events
RealtimeConnectedEvent
: Emitted when connection is establishedRealtimeDisconnectedEvent
: Emitted when connection is lost
Audio Events
RealtimeAudioInputEvent
: Audio data received from userRealtimeAudioOutputEvent
: Audio data sent to user
Transcript Events
RealtimeTranscriptEvent
: Final transcript of user speechRealtimePartialTranscriptEvent
: Partial transcript during speech
Response Events
RealtimeResponseEvent
: Complete response from AIStandardizedTextDeltaEvent
: Streaming text deltas
Error Events
RealtimeErrorEvent
: Errors during processing