Skip to main content
AWS Polly provides cloud-based TTS with natural-sounding voices across multiple languages. Supports both standard and neural engines.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[aws]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=deepgram.STT(),
    tts=aws.TTS(),
)
AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles, or IAM roles).

Parameters

NameTypeDefaultDescription
voice_idstr"Joanna"Voice ID
enginestrNoneEngine ("standard" or "neural")
region_namestrNoneAWS region
language_codestrNoneLanguage (e.g., "en-US", "es-ES")

Neural Engine

For more natural-sounding voices:
tts = aws.TTS(engine="neural", voice_id="Joanna")

SSML Support

tts = aws.TTS(text_type="ssml")
tts.send('<speak>Hello <break time="500ms"/> world!</speak>')

Next Steps