> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime Class

The Realtime component provides end-to-end speech-to-speech communication, combining STT, LLM, and TTS functionality in a single, optimized interface. It delivers ultra-low latency speech processing, direct audio streaming without intermediate text conversion, and support for multiple modalities (audio, video, text).

## When to Use Realtime

Use a **Realtime** LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required.

Use the **traditional STT → LLM → TTS** pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don't support realtime audio.

## Supported Providers

* [OpenAI Realtime](/integrations/realtime/openai) — WebRTC-based, supports video
* [Gemini Live](/integrations/realtime/gemini) — WebSocket-based, supports video
* [AWS Nova](/integrations/realtime/aws-bedrock) — WebSocket-based
* [Qwen Omni](/integrations/realtime/qwen) — Native audio support

## Basic Usage

```python theme={null}
from vision_agents.plugins import openai, getstream
from vision_agents.core.agents import Agent
from vision_agents.core.edge.types import User

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant", id="agent"),
    instructions="You're a helpful voice assistant",
    llm=openai.Realtime(model="gpt-realtime", voice="marin"),
    processors=[]
)
```

## Methods

### `simple_response(text, processors=None, participant=None)`

Sends a text prompt to the realtime model. The model responds with audio.

```python theme={null}
await agent.llm.simple_response("What do you see in the video?")
```

### `simple_audio_response(pcm, participant=None)`

Sends raw PCM audio data directly to the model for processing.

```python theme={null}
await agent.llm.simple_audio_response(audio_pcm_data)
```

## Properties

| Property     | Type   | Description                                                                                                                                    |
| ------------ | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `connected`  | `bool` | `True` if the realtime session is active                                                                                                       |
| `fps`        | `int`  | Video frames per second sent to the model (default: 1)                                                                                         |
| `session_id` | `str`  | UUID identifying the current session                                                                                                           |
| `epoch`      | `int`  | Monotonic interruption counter. Increments each time `interrupt()` is called, allowing stale audio output events to be identified and dropped. |

## Methods

### `interrupt()`

Increments the epoch counter so that any in-flight `RealtimeAudioOutputEvent` from a previous response is detected as stale and discarded by the Agent. The Agent calls this automatically on barge-in.

## Events

The Realtime class emits events for monitoring conversations:

| Event                                   | Description                                               |
| --------------------------------------- | --------------------------------------------------------- |
| `RealtimeConnectedEvent`                | Connection established with session config & capabilities |
| `RealtimeDisconnectedEvent`             | Connection closed (includes reason and clean-close flag)  |
| `RealtimeUserSpeechTranscriptionEvent`  | Transcript of user speech                                 |
| `RealtimeAgentSpeechTranscriptionEvent` | Transcript of agent speech                                |
| `RealtimeResponseEvent`                 | AI response text (with `is_complete` flag)                |
| `RealtimeAudioInputEvent`               | Audio sent to the realtime LLM                            |
| `RealtimeAudioOutputEvent`              | Audio received from the realtime LLM                      |
| `RealtimeAudioOutputDoneEvent`          | Audio output complete for a response                      |
| `RealtimeConversationItemEvent`         | Conversation state update (message, function call, etc.)  |
| `RealtimeErrorEvent`                    | Error during processing (with recoverability flag)        |

```python theme={null}
from vision_agents.core.llm.events import RealtimeUserSpeechTranscriptionEvent

@agent.llm.events.on(RealtimeUserSpeechTranscriptionEvent)
async def on_user_speech(event):
    print(f"User said: {event.text}")
```

<Note>
  For provider-specific parameters and configuration, see the integration docs for [OpenAI](/integrations/realtime/openai), [Gemini](/integrations/realtime/gemini), [AWS Bedrock](/integrations/realtime/aws-bedrock), or [Qwen](/integrations/realtime/qwen).
</Note>
