> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# TwelveLabs

[TwelveLabs](https://twelvelabs.io/) provides **Pegasus**, a video understanding model that analyzes short clips rather than single frames. Use it to reason about motion and events over time, such as answering "what just happened?", in real-time video calls.

<Info>
  Vision Agents uses [Stream Video](https://getstream.io/video/) for real-time WebRTC transport by default. [External WebRTC transports](/integrations/introduction-to-integrations#edge-transport) are supported as well. Most AI providers offer free tiers to get started.
</Info>

## Installation

```sh theme={null}
uv add "vision-agents[twelvelabs]"
```

You can get a free API key at [twelvelabs.io](https://twelvelabs.io/).

## Quick Start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import twelvelabs, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="Describe what just happened in the video.",
    llm=twelvelabs.PegasusVLM(),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)
```

<Warning>
  Set `TWELVELABS_API_KEY` in your environment or pass `api_key` directly.
</Warning>

## How it works

Unlike frame-by-frame VLMs, Pegasus buffers recent frames from the watched video track, encodes them into a short MP4 clip, uploads it to the TwelveLabs Assets API, and analyzes it with your prompt. The streamed answer is spoken by your agent's TTS.

Pegasus works well for questions about recent activity: "What did they just do?", "Did anything fall?", "Describe the last few seconds."

<Tip>
  Wait a few seconds after a participant joins before prompting, so enough video is buffered for analysis.
</Tip>

## Parameters

| Name           | Type    | Default        | Description                                        |
| -------------- | ------- | -------------- | -------------------------------------------------- |
| `api_key`      | `str`   | `None`         | API key (defaults to `TWELVELABS_API_KEY` env var) |
| `model_name`   | `str`   | `"pegasus1.5"` | Pegasus model identifier                           |
| `fps`          | `float` | `1.0`          | Frame sampling rate for the buffered clip          |
| `clip_seconds` | `int`   | `5`            | Clip length analyzed per request (minimum `4`)     |
| `max_tokens`   | `int`   | `512`          | Maximum response tokens (minimum `512`)            |

## Trigger on participant join

Prompt Pegasus once a caller's camera has buffered enough video:

```python theme={null}
import asyncio

from vision_agents.plugins.getstream import CallSessionParticipantJoinedEvent


@agent.events.subscribe
async def on_participant_joined(event: CallSessionParticipantJoinedEvent):
    if event.participant.user.id != "agent":
        await asyncio.sleep(5)
        await agent.simple_response("Describe what just happened in the video")
```

## Notes

* Pegasus requires a minimum resolution of 360×360; lower-resolution frames are scaled up on encode.
* Each request uploads a clip and runs server-side analysis, so latency is higher than single-frame VLMs. Tune `fps` and `clip_seconds` for your use case.
* Uploaded clips are deleted after analysis; asset cleanup is best-effort and does not block the response.

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
