Skip to main content
To get started with the Vision Agents framework, developers can install the package from pypi. We recommend using uv as the package manager which is also open-source and free to use. To get started run:
uv add vision-agents 
By default, the SDK does not install with any packages. To install packages, you can run the following:
uv add "vision-agents[getstream, openai, elevenlabs, deepgram]"
Before running, you will also need a free API key from Stream. Developers building with Stream each receive 333,000 participant minutes free each month and indie developers and small businesses can apply to our Maker Program which includes an additional $500 worth of credits each month. Each provider also provides free development keys on their respective websites.
Plugin NameDescriptionDocs Link
CartesiaTTS plugin for realistic voice synthesis in real-time voice applicationsCartesia
DeepgramSTT plugin for fast, accurate real-time transcription with speaker diarizationDeepgram
ElevenLabsTTS plugin with highly realistic and expressive voices for conversational agentsElevenLabs
KokoroLocal TTS engine for offline voice synthesis with low latencyKokoro
MoonshineLocal STT engine optimized for edge deployments and low-resource environmentsMoonshine
OpenAIRealtime API for building conversational agents with out of the box support for real-time video directly over WebRTCOpenAI
GeminiRealtime API for building conversational agents with support for both voice and videoGemini
SileroVAD plugin for detecting human speech activity in real-time audio streamsSilero
WizperSTT plugin with real-time translation capabilities powered by Whisper v3Wizper
I