> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Inworld

[Inworld AI](https://inworld.ai) provides expressive TTS designed for conversational AI and game characters. The plugin defaults to Inworld's **TTS-2** model, which adds natural-language steering, 100+ languages (15 GA, 90+ experimental), and high-quality instant voice cloning.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

<Tip>
  Inworld also offers a [Realtime](/integrations/realtime/inworld) speech-to-speech API over WebRTC.
</Tip>

## Installation

```sh theme={null}
uv add "vision-agents[inworld]"
```

Get your API key from the [Inworld Portal](https://studio.inworld.ai/) and set `INWORLD_API_KEY` in your environment (or pass `api_key=` explicitly).

## Quick Start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import inworld, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    stt=deepgram.STT(),
    tts=inworld.TTS(),  # defaults to model_id="inworld-tts-2", voice_id="Sarah"
)
```

<Warning>
  Set `INWORLD_API_KEY` in your environment or pass `api_key` directly.
</Warning>

## Parameters

| Name                       | Type            | Default           | Description                                                                                  |
| -------------------------- | --------------- | ----------------- | -------------------------------------------------------------------------------------------- |
| `api_key`                  | `str`           | `None`            | API key (defaults to `INWORLD_API_KEY` env var)                                              |
| `voice_id`                 | `str`           | `"Sarah"`         | Voice ID (`"Sarah"`, `"Dennis"`, `"Ashley"`, `"Olivia"`, `"Clive"`, or custom/cloned voices) |
| `model_id`                 | `str`           | `"inworld-tts-2"` | Model (`"inworld-tts-2"`, `"inworld-tts-1.5-max"`, `"inworld-tts-1.5-mini"`)                 |
| `sample_rate`              | `int`           | `16000`           | Desired PCM output sample rate in Hz                                                         |
| `temperature`              | `float`         | `1.1`             | Randomness when sampling audio tokens (0–2)                                                  |
| `speaking_rate`            | `float`         | `None`            | Speech speed multiplier (0.5–1.5). `None` uses the server default                            |
| `auto_mode`                | `bool`          | `True`            | Let Inworld decide optimal flush behavior for streamed input                                 |
| `apply_text_normalization` | `"ON" \| "OFF"` | `None`            | Optional text normalization behavior                                                         |
| `ws_url`                   | `str`           | Inworld endpoint  | Inworld bidirectional WebSocket endpoint                                                     |

<Note>
  `inworld-tts-1` and `inworld-tts-1-max` are deprecated by Inworld — migrate to `inworld-tts-2` or `inworld-tts-1.5-*`.
</Note>

## Steering (TTS-2)

TTS-2 takes natural-language stage directions inline with your text. Place the instruction in square brackets before the segment it should apply to:

```python theme={null}
text = (
    "[whisper in a hushed style] I have to tell you something. "
    "[laugh] Just kidding! [say with force] Now let's get to work."
)
async for chunk in await tts.stream_audio(text):
    ...
```

Steering covers articulation, intonation, volume, pitch, range, speed, and vocal style — and supports non-verbal sounds like `[laugh]`, `[breathe]`, `[clear throat]`, `[sigh]`, `[cough]`, `[yawn]`. Combining dimensions (`[whisper in a hushed style]`, `[say playfully and very fast]`) produces better results than bare single-word tags.

See Inworld's [steering docs](https://docs.inworld.ai/tts/capabilities/steering) and [prompting guide](https://docs.inworld.ai/tts/best-practices/prompting-for-tts-2) for the full reference.

<Note>Inworld TTS supports up to 2,000 characters per request. The plugin connects to Inworld's bidirectional WebSocket endpoint and streams 16-bit PCM audio at the configured `sample_rate` — no extra configuration needed.</Note>

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
