AWS Bedrock provides realtime speech-to-speech using Amazon Nova Sonic models with automatic session management. The plugin handles Nova’s 8-minute connection limit transparently.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
The AWS plugin requires Python 3.12+. Nova Sonic is audio-only — video
parameters such as fps have no effect. For video agents, use Gemini
Realtime or a custom
pipeline.
STREAM_API_KEY=...STREAM_API_SECRET=...AWS_ACCESS_KEY_ID=... # or IAM role / ~/.aws profileAWS_SECRET_ACCESS_KEY=...
You also need Bedrock model access enabled for Nova Sonic in your chosen region, and IAM permission for bidirectional streaming (bedrock:InvokeModelWithBidirectionalStream).
AWS credentials are resolved via the standard AWS SDK chain (environment
variables, AWS profiles via aws_profile, or IAM roles). The aws.Realtime
constructor does not accept explicit access key parameters.
AWS Bedrock has an 8-minute connection limit. The plugin handles this automatically:
After 5 minutes of connection age (configurable via reconnect_after_minutes) and more than 3 seconds since last audio activity, reconnects during a quiet moment
After 7 minutes of connection age, forces reconnect regardless of audio activity
Last audio activity includes incoming user speech (detected by Silero VAD) and outgoing agent audio.
The plugin uses Silero VAD to track incoming user speech for reconnection timing. Agent audio output updates activity separately. Silero warmup is handled automatically by the Agent lifecycle.