AWS Polly provides cloud-based TTS with natural-sounding voices across multiple languages. Supports both standard and neural engines.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[aws]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import aws, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=aws.TTS(),
)
AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles, or IAM roles).
Parameters
| Name | Type | Default | Description |
|---|
voice_id | str | "Joanna" | Voice ID |
engine | str | None | Engine ("standard" or "neural") |
region_name | str | None | AWS region |
language_code | str | None | Language (e.g., "en-US", "es-ES") |
Neural Engine
For more natural-sounding voices:
tts = aws.TTS(engine="neural", voice_id="Joanna")
SSML Support
tts = aws.TTS(text_type="ssml")
tts.send('<speak>Hello <break time="500ms"/> world!</speak>')
Next Steps