AWS Polly

AWS Polly is Amazon’s cloud-based Text-to-Speech (TTS) service that converts text into lifelike speech.
It has a wide selection of natural-sounding voices across multiple languages and supports both standard and neural engine options. The AWS Polly plugin for the Stream Python AI SDK allows you to add TTS functionality to your project.

Installation

Install the Stream AWS plugin with

uv add vision-agents[aws]

Example

Check out our AWS Polly example to see a working code sample using the plugin, or read on for extra details.

Initialisation

The AWS Polly plugin for Stream exists in the form of the TTS class:

from vision_agents.plugins import aws

tts = aws.TTS()

AWS credentials are resolved via the standard AWS SDK chain (environment variables, AWS profiles, or IAM roles). Make sure your AWS credentials are properly configured with access to Amazon Polly.

Parameters

These are the parameters available in the AWS Polly TTS plugin for you to customise:

Name	Type	Default	Description
`region_name`	`str` or `None`	`None`	AWS region name. If not provided, uses `AWS_REGION` or `AWS_DEFAULT_REGION` environment variable, or defaults to `us-east-1`.
`voice_id`	`str`	`"Joanna"`	The ID of the voice to use for TTS. AWS Polly offers a variety of voices across different languages.
`text_type`	`str` or `None`	`"text"`	Type of input text: `"text"` for plain text or `"ssml"` for Speech Synthesis Markup Language.
`engine`	`str` or `None`	`None`	The synthesis engine to use: `"standard"` or `"neural"`. Neural voices provide more natural-sounding speech.
`language_code`	`str` or `None`	`None`	Language code for the voice (e.g., `"en-US"`, `"es-ES"`). Optional parameter for specifying the language variant.
`lexicon_names`	`List[str]` or `None`	`None`	List of pronunciation lexicon names to apply. Lexicons allow you to customize pronunciation of specific words.
`client`	`Any` or `None`	`None`	Optional pre-configured boto3 Polly client. If not provided, a client will be created automatically.

Functionality

Send text to convert to speech

The send() method sends the text passed in for the service to synthesize. The resulting audio is then played through the configured output track.

tts.send("Demo text you want AI voice to say")

SSML support

AWS Polly supports Speech Synthesis Markup Language (SSML) for advanced control over speech output:

tts = aws.TTS(text_type="ssml")
tts.send('<speak>Hello <break time="500ms"/> world!</speak>')

Neural engine

For more natural-sounding voices, use the neural engine:

tts = aws.TTS(engine="neural", voice_id="Joanna")

Overview

AI Providers

Custom Integrations

Installation

Example

Initialisation

Parameters

Functionality

Send text to convert to speech

SSML support

Neural engine

Overview

AI Providers

Custom Integrations

​Installation

​Example

​Initialisation

​Parameters

​Functionality

​Send text to convert to speech

​SSML support

​Neural engine

Installation

Example

Initialisation

Parameters

Functionality

Send text to convert to speech

SSML support

Neural engine