Skip to main content
Stream’s Python AI SDK ships with a growing catalogue of plugins that connect third-party AI services to your live video calls. Each plugin wraps a specific AI provider and exposes a unified API so you can swap vendors without rewriting business logic.

Why Use a Plugin

  • Quality & speed: Leverage specialised models (e.g. Deepgram’s high-accuracy STT, ElevenLabs’ realistic TTS) without hosting anything yourself.
  • Drop-in architecture: Plugins of the same type inherit from the same base class with a consistent interface (e.g. STT plugins implement process_audio() and close(), emit transcript events). You can swap implementations in minutes.
  • Runs inside your call: The SDK streams PCM frames directly to the provider in real-time, then emits SDK events that can be listened for and acted upon.

What can you build?

CapabilityExample providers
Speech-to-Text (STT)Deepgram, Moonshine (local)
Text-to-Speech (TTS)ElevenLabs, Kokoro
Voice Activity Detection (VAD)Silero
Combine them to create richer pipelines; e.g. VAD → STT → Moderation → TTS for a real-time, policy-aware voice agent. See the other pages in this section for our individual third-party integrations. We’ll add more over time, and you can even write your own!

Creating Your Own Plugins

You can absolutely write your own plugins to connect other AI providers to the AI Python SDK! Follow this guide to learn how.
I