> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# TurboPuffer

[TurboPuffer](https://turbopuffer.com/) is a high-performance vector database with native hybrid search (vector + BM25). The plugin provides RAG with precise control over chunking, embeddings, and search strategies.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

## Installation

```sh theme={null}
uv add "vision-agents[turbopuffer]"
```

## Quick Start

```python theme={null}
from vision_agents.plugins import turbopuffer

# Initialize RAG
rag = turbopuffer.TurboPufferRAG(namespace="my-knowledge")
await rag.add_directory("./knowledge")

# Hybrid search (default)
results = await rag.search("How does the chat API work?")
```

<Warning>
  Set `TURBO_PUFFER_KEY` and `GOOGLE_API_KEY` (for Gemini embeddings) in your
  environment.
</Warning>

## Parameters

| Name              | Type  | Default                         | Description            |
| ----------------- | ----- | ------------------------------- | ---------------------- |
| `namespace`       | `str` | Required                        | TurboPuffer namespace  |
| `embedding_model` | `str` | `"models/gemini-embedding-001"` | Embedding model        |
| `chunk_size`      | `int` | `10000`                         | Text chunk size        |
| `chunk_overlap`   | `int` | `200`                           | Overlap between chunks |

## Search Modes

```python theme={null}
# Hybrid (recommended) - combines vector and BM25
results = await rag.search(query, mode="hybrid")

# Vector only - semantic similarity
results = await rag.search(query, mode="vector")

# BM25 only - keyword matching
results = await rag.search(query, mode="bm25")
```

## How Hybrid Search Works

Hybrid search combines vector and BM25 using Reciprocal Rank Fusion (RRF):

* **Vector search** catches semantic meaning even when exact words differ
* **BM25** catches exact matches (product names, SKUs, technical terms)
* **RRF** balances both without requiring tuning

## With Function Calling

```python theme={null}
@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
    return await rag.search(query, top_k=5, mode="hybrid")
```

## Cache Warming

For low-latency queries, TurboPuffer supports cache warming (called automatically after `add_directory()`):

```python theme={null}
await rag.warm_cache()
```

See the [RAG Guide](/guides/rag) for more details.

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
