| AWS Bedrock | Realtime speech-to-speech plugin using Amazon Nova models with automatic reconnection | AWS Bedrock |
| AWS Polly | TTS plugin using Amazon’s cloud-based service with natural-sounding voices and neural engine support | AWS Polly |
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications | Cartesia |
| Decart | Real-time AI video transformation service for applying artistic styles and effects to video streams | Decart |
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization | Deepgram |
| ElevenLabs | TTS plugin with highly realistic and expressive voices for conversational agents | ElevenLabs |
| Fast-Whisper | High-performance STT plugin using OpenAI’s Whisper model with CTranslate2 for fast inference | Fast-Whisper |
| Fish Audio | STT and TTS plugin with automatic language detection and voice cloning capabilities | Fish Audio |
| Gemini | Realtime API for building conversational agents with support for both voice and video | Gemini |
| HeyGen | Realtime interactive avatars powered by HeyGen | Heygen |
| Inworld | TTS plugin with high-quality streaming voices for real-time conversational AI agents | Inworld |
| Kokoro | Local TTS engine for offline voice synthesis with low latency | Kokoro |
| Moondream | Moondream provides realtime detection and VLM capabilities. Developers can choose from using the hosted API or running locally on their CUDA devices. Vision Agents supports Moondream’s Detect, Caption and VQA skills out-of-the-box. | Moondream |
| OpenAI | Realtime API for building conversational agents with out of the box support for real-time video directly over WebRTC, LLMs and Open AI TTS | OpenAI |
| OpenRouter | LLM plugin providing access to multiple providers (Anthropic, Google, OpenAI) through a unified API | OpenRouter |
| Qwen | Realtime audio plugin using Alibaba’s Qwen3 with native audio output and built-in speech recognition | Qwen |
| Roboflow | Object detection processor using Roboflow’s hosted API or local RF-DETR models | Roboflow |
| Smart Turn | Advanced turn detection system combining Silero VAD, Whisper, and neural models for natural conversation flow | Smart Turn |
| Ultralytics | Real-time pose detection processor using YOLO models with skeleton overlays | Ultralytics |
| Vogent | Neural turn detection system for intelligent turn-taking in voice conversations | Vogent |
| Wizper | STT plugin with real-time translation capabilities powered by Whisper v3 | Wizper |
| xAI | LLM plugin using xAI’s Grok models with advanced reasoning and real-time knowledge | xAI |