Skip to main content
AWS Bedrock provides realtime speech-to-speech using Amazon Nova Sonic models with automatic session management. The plugin handles Nova’s 8-minute connection limit transparently.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
The AWS plugin requires Python 3.12+. Nova Sonic is audio-only — video parameters such as fps have no effect. For video agents, use Gemini Realtime or a custom pipeline.

Installation

uv add "vision-agents[aws,getstream]"
The quick start uses getstream.Edge(), so both extras are required.

Environment Variables

STREAM_API_KEY=...
STREAM_API_SECRET=...
AWS_ACCESS_KEY_ID=...    # or IAM role / ~/.aws profile
AWS_SECRET_ACCESS_KEY=...
You also need Bedrock model access enabled for Nova Sonic in your chosen region, and IAM permission for bidirectional streaming (bedrock:InvokeModelWithBidirectionalStream).

Quick Start

from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import aws, getstream

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You are a helpful voice assistant.",
        llm=aws.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles via aws_profile, or IAM roles). The aws.Realtime constructor does not accept explicit access key parameters.

Parameters

NameTypeDefaultDescription
modelstr"amazon.nova-2-sonic-v1:0"Nova model ID
region_namestr"us-east-1"AWS region
voice_idstr"matthew"Voice (available voices)
reconnect_after_minutesfloat5.0Reconnect after this many minutes of connection age, when audio is idle
aws_profilestrNoneAWS profile name from ~/.aws/credentials or ~/.aws/config

Nova Behavior

  • instructions are required before the model accepts user input.
  • Text-only prompts may not produce audio — use audio input for reliable responses.
  • Function calling may require at least one audio content block before text tool calls work reliably.

Automatic Reconnection

AWS Bedrock has an 8-minute connection limit. The plugin handles this automatically:
  • After 5 minutes of connection age (configurable via reconnect_after_minutes) and more than 3 seconds since last audio activity, reconnects during a quiet moment
  • After 7 minutes of connection age, forces reconnect regardless of audio activity
Last audio activity includes incoming user speech (detected by Silero VAD) and outgoing agent audio.

Voice Activity Detection

The plugin uses Silero VAD to track incoming user speech for reconnection timing. Agent audio output updates activity separately. Silero warmup is handled automatically by the Agent lifecycle.

Function Calling

Register tools on the LLM before the agent connects:
llm = aws.Realtime()

@llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"city": location, "temperature": 72, "condition": "Sunny"}

agent = Agent(..., llm=llm)
See the Function Calling guide for details.

Next Steps

Build a Voice Agent

Get started with voice

Function Calling

Tools and MCP integration