Why Use a Plugin
- Quality & speed: Leverage specialised models (e.g. Deepgram’s high-accuracy STT, ElevenLabs’ realistic TTS) without hosting anything yourself.
- Drop-in architecture: Plugins of the same type inherit from the same base class with a consistent interface (e.g. STT plugins implement process_audio() and close(), emit transcript events). You can swap implementations in minutes.
- Runs inside your call: The SDK streams PCM frames directly to the provider in real-time, then emits SDK events that can be listened for and acted upon.
What can you build?
Capability | Example providers |
---|---|
Speech-to-Text (STT) | Deepgram, Moonshine (local) |
Text-to-Speech (TTS) | ElevenLabs, Kokoro |
Voice Activity Detection (VAD) | Silero |