Skip to main content
AWS Polly is Amazon’s cloud-based Text-to-Speech (TTS) service that converts text into lifelike speech.
It has a wide selection of natural-sounding voices across multiple languages and supports both standard and neural engine options.
The AWS Polly plugin for the Stream Python AI SDK allows you to add TTS functionality to your project.

Installation

Install the Stream AWS plugin with
uv add vision-agents[aws]

Example

Check out our AWS Polly example to see a working code sample using the plugin, or read on for extra details.

Initialisation

The AWS Polly plugin for Stream exists in the form of the TTS class:
from vision_agents.plugins import aws

tts = aws.TTS()
AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles, or IAM roles). Make sure your AWS credentials are properly configured with access to Amazon Polly.

Parameters

These are the parameters available in the AWS Polly TTS plugin for you to customise:
NameTypeDefaultDescription
region_namestr or NoneNoneAWS region name. If not provided, uses AWS_REGION or AWS_DEFAULT_REGION environment variable, or defaults to us-east-1.
voice_idstr"Joanna"The ID of the voice to use for TTS. AWS Polly offers a variety of voices across different languages.
text_typestr or None"text"Type of input text: "text" for plain text or "ssml" for Speech Synthesis Markup Language.
enginestr or NoneNoneThe synthesis engine to use: "standard" or "neural". Neural voices provide more natural-sounding speech.
language_codestr or NoneNoneLanguage code for the voice (e.g., "en-US", "es-ES"). Optional parameter for specifying the language variant.
lexicon_namesList[str] or NoneNoneList of pronunciation lexicon names to apply. Lexicons allow you to customize pronunciation of specific words.
clientAny or NoneNoneOptional pre-configured boto3 Polly client. If not provided, a client will be created automatically.

Functionality

Send text to convert to speech

The send() method sends the text passed in for the service to synthesize. The resulting audio is then played through the configured output track.
tts.send("Demo text you want AI voice to say")

SSML support

AWS Polly supports Speech Synthesis Markup Language (SSML) for advanced control over speech output:
tts = aws.TTS(text_type="ssml")
tts.send('<speak>Hello <break time="500ms"/> world!</speak>')

Neural engine

For more natural-sounding voices, use the neural engine:
tts = aws.TTS(engine="neural", voice_id="Joanna")