AWS Bedrock - Vision Agents

AWS Bedrock provides realtime speech-to-speech using Amazon Nova Sonic models with automatic session management. The plugin handles Nova’s 8-minute connection limit transparently.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

The AWS plugin requires Python 3.12+. Nova Sonic is audio-only — video parameters such as fps have no effect. For video agents, use Gemini Realtime or a custom pipeline.

Installation

uv add "vision-agents[aws,getstream]"

The quick start uses getstream.Edge(), so both extras are required.

Environment Variables

STREAM_API_KEY=...
STREAM_API_SECRET=...
AWS_ACCESS_KEY_ID=...    # or IAM role / ~/.aws profile
AWS_SECRET_ACCESS_KEY=...

You also need Bedrock model access enabled for Nova Sonic in your chosen region, and IAM permission for bidirectional streaming (bedrock:InvokeModelWithBidirectionalStream).

Quick Start

from dotenv import load_dotenv

from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import aws, getstream

load_dotenv()


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You are a helpful voice assistant.",
        llm=aws.Realtime(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles via aws_profile, or IAM roles). The aws.Realtime constructor does not accept explicit access key parameters.

Parameters

Name	Type	Default	Description
`model`	`str`	`"amazon.nova-2-sonic-v1:0"`	Nova model ID
`region_name`	`str`	`"us-east-1"`	AWS region
`voice_id`	`str`	`"matthew"`	Voice (available voices)
`reconnect_after_minutes`	`float`	`5.0`	Reconnect after this many minutes of connection age, when audio is idle
`aws_profile`	`str`	`None`	AWS profile name from `~/.aws/credentials` or `~/.aws/config`

Nova Behavior

instructions are required before the model accepts user input.
Text-only prompts may not produce audio — use audio input for reliable responses.
Function calling may require at least one audio content block before text tool calls work reliably.

Automatic Reconnection

AWS Bedrock has an 8-minute connection limit. The plugin handles this automatically:

After 5 minutes of connection age (configurable via reconnect_after_minutes) and more than 3 seconds since last audio activity, reconnects during a quiet moment
After 7 minutes of connection age, forces reconnect regardless of audio activity

Last audio activity includes incoming user speech (detected by Silero VAD) and outgoing agent audio.

Voice Activity Detection

The plugin uses Silero VAD to track incoming user speech for reconnection timing. Agent audio output updates activity separately. Silero warmup is handled automatically by the Agent lifecycle.

Function Calling

llm = aws.Realtime()

@llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"city": location, "temperature": 72, "condition": "Sunny"}

agent = Agent(..., llm=llm)

See the Function Calling guide for details.

Next Steps

Build a Voice Agent

Get started with voice

Function Calling

Tools and MCP integration

​Installation

​Environment Variables

​Quick Start

​Parameters

​Nova Behavior

​Automatic Reconnection

​Voice Activity Detection

​Function Calling

​Next Steps

Build a Voice Agent

Function Calling

Installation

Environment Variables

Quick Start

Parameters

Nova Behavior

Automatic Reconnection

Voice Activity Detection

Function Calling

Next Steps