> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime Class

The Realtime component provides end-to-end speech-to-speech communication, combining STT, LLM, and TTS functionality in a single, optimized interface. It delivers ultra-low latency speech processing, direct audio streaming without intermediate text conversion, and support for multiple modalities (audio, video, text).

## When to Use Realtime

Use a **Realtime** LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required.

Use the **traditional STT → LLM → TTS** pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don't support realtime audio.

## Supported Providers

* [OpenAI Realtime](/integrations/realtime/openai) — WebRTC-based, supports video
* [Gemini Live](/integrations/realtime/gemini) — WebSocket-based, supports video
* [AWS Nova](/integrations/realtime/aws-bedrock) — WebSocket-based
* [Qwen Omni](/integrations/realtime/qwen) — Native audio support

## Basic Usage

```python theme={null}
from vision_agents.plugins import openai, getstream
from vision_agents.core.agents import Agent
from vision_agents.core.edge.types import User

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant", id="agent"),
    instructions="You're a helpful voice assistant",
    llm=openai.Realtime(model="gpt-realtime", voice="marin"),
    processors=[]
)
```

## Agent methods with realtime

```python theme={null}
await agent.simple_response("What do you see in the video?", interrupt=True)
```

Use `agent.simple_response(...)` to inject text prompts and `agent.say(...)` for scripted speech. You usually do not call realtime audio methods directly from app code.

## Properties

| Property     | Type   | Description                                                                                                                                    |
| ------------ | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `connected`  | `bool` | `True` if the realtime session is active                                                                                                       |
| `fps`        | `int`  | Video frames per second sent to the model (default: 1)                                                                                         |
| `session_id` | `str`  | UUID identifying the current session                                                                                                           |
| `epoch`      | `int`  | Monotonic interruption counter. Increments each time `interrupt()` is called, allowing stale audio output events to be identified and dropped. |

## Realtime methods

### `interrupt()`

Increments the epoch counter so that any in-flight audio output from a previous response is detected as stale and discarded by the Agent. The Agent calls this automatically on barge-in.

## Events

The Realtime class emits a small set of events for connection state:

| Event                       | Description                                               |
| --------------------------- | --------------------------------------------------------- |
| `RealtimeConnectedEvent`    | Connection established with session config & capabilities |
| `RealtimeDisconnectedEvent` | Connection closed (includes `reason` and `clean` flag)    |

For conversation events, subscribe to the agent-level events — `UserTranscriptEvent` fires in both classic STT and realtime modes:

```python theme={null}
from vision_agents.core.agents.events import UserTranscriptEvent

@agent.events.subscribe
async def on_user_speech(event: UserTranscriptEvent):
    print(f"User said: {event.text}")
```

See [Events Reference](/reference/events-reference) for the full event surface, including LLM, tool, and error events.

<Note>
  For provider-specific parameters and configuration, see the integration docs for [OpenAI](/integrations/realtime/openai), [Gemini](/integrations/realtime/gemini), [AWS Bedrock](/integrations/realtime/aws-bedrock), or [Qwen](/integrations/realtime/qwen).
</Note>
