ElevenLabs provides real-time speech-to-text via Scribe v2 with ~150ms latency, 99 languages, and built-in VAD-based turn detection. No separate turn detection plugin is needed.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
ElevenLabs also provides highly realistic text-to-speech. You can use both in the same agent.
ElevenLabs STT includes built-in turn detection via VAD. When you use elevenlabs.STT, the Agent automatically ignores any external TurnDetector plugin to prevent conflicts. You do not need to add a separate turn detection plugin.