Inworld Realtime - Vision Agents

Inworld AI provides a WebRTC-based Realtime speech-to-speech API. Uses native Opus over UDP for lower latency than WebSocket alternatives.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Inworld also provides standalone text-to-speech.

Installation

uv add "vision-agents[inworld]"

Get your API key from the Inworld Portal and set INWORLD_API_KEY in your environment (or pass api_key= explicitly).

Quick Start

Pair Inworld Realtime with a WebRTC-capable edge transport like getstream.Edge().

from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, inworld, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="My Agent", id="agent"),
    llm=inworld.Realtime(
        model="openai/gpt-4o-mini",
        voice="Dennis",
        instructions="You are a friendly voice assistant.",
    ),
    turn_detection=smart_turn.TurnDetection(),
)

Parameters

Name	Type	Default	Description
`model`	`str`	`"openai/gpt-4o-mini"`	Provider-prefixed model ID (e.g. `"openai/gpt-4o-mini"`, `"google-ai-studio/gemini-2.5-flash"`, `"inworld/<router-id>"`)
`voice`	`str`	`"Dennis"`	Voice for audio responses (`"Dennis"`, `"Clive"`, `"Olivia"`, or custom)
`api_key`	`str`	`None`	API key (defaults to `INWORLD_API_KEY` env var)
`instructions`	`str`	`None`	System prompt
`realtime_session`	`RealtimeSessionCreateRequestParam`	`None`	Advanced — pass a full session param for custom turn detection, `tool_choice`, and other fields

Registering Tools

Inworld’s Realtime API is protocol-compatible with OpenAI’s Realtime API, so registered functions follow the OpenAI function-calling schema.

realtime = inworld.Realtime()

@realtime.register_function(description="Get the current weather for a city.")
async def get_weather(city: str) -> str:
    return f"It's sunny in {city}."

v1 is WebRTC only; a WebSocket transport may be added later. Video input is not currently supported by Inworld’s Realtime API.

Next Steps

Inworld TTS

Standalone text-to-speech

Build a Voice Agent

Get started with voice

Gemini Realtime OpenAI Realtime

⌘I

​Installation

​Quick Start

​Parameters

​Registering Tools

​Next Steps

Inworld TTS

Build a Voice Agent

Installation

Quick Start

Parameters

Registering Tools

Next Steps