> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG for Agents

Give your agents access to documents, URLs, and knowledge bases using Retrieval-Augmented Generation (RAG).

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account for real-time transport.
</Info>

## Options

| Option                 | Best For                                  | Complexity |
| ---------------------- | ----------------------------------------- | ---------- |
| **Gemini File Search** | Quick setup, automatic chunking/embedding | Simple     |
| **TurboPuffer**        | Full control, hybrid search, production   | More setup |

## Gemini File Search

[Gemini's File Search](https://ai.google.dev/gemini-api/docs/file-search) handles chunking, embedding, and retrieval automatically.

```python theme={null}
from vision_agents.plugins import gemini

# Create and populate a file search store
store = gemini.GeminiFilesearchRAG(name="my-knowledge-base")
await store.create()  # Reuses existing store if found
await store.add_directory("./knowledge")  # Skips duplicates via content hash

# Use with Gemini LLM
llm = gemini.LLM(
    model="gemini-3-flash-preview",
    tools=[gemini.tools.FileSearch(store)]
)
```

**Features:**

* Store reuse (finds existing stores by name)
* Content deduplication via SHA-256 hash
* Concurrent batch uploads

## TurboPuffer

[TurboPuffer](https://turbopuffer.com/) provides hybrid search combining vector (semantic) and BM25 (keyword) search with Reciprocal Rank Fusion.

```python theme={null}
from vision_agents.plugins import turbopuffer, gemini

# Initialize with hybrid search
rag = turbopuffer.TurboPufferRAG(
    namespace="my-knowledge",
    chunk_size=10000,
    chunk_overlap=200,
)
await rag.add_directory("./knowledge")

# Register as function for LLM
llm = gemini.LLM("gemini-3-flash-preview")

@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
    return await rag.search(query, top_k=5, mode="hybrid")
```

## RAG Pipeline Overview

For custom implementations, a typical RAG pipeline involves:

1. **Document gathering** — URLs, folders, PDFs, external APIs
2. **Parsing** — Convert to text (markdownify, BeautifulSoup, OCR)
3. **Chunking** — Split into retrievable pieces (fixed size, semantic, recursive)
4. **Embedding** — Convert text to vectors ([MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard))
5. **Vector storage** — Store embeddings for similarity search
6. **Hybrid search** — Combine vector + full-text search ([TurboPuffer guide](https://turbopuffer.com/docs/hybrid))
7. **Reranking** — Score and filter results before passing to LLM

## Comparison

| Feature  | Gemini File Search   | TurboPuffer                  |
| -------- | -------------------- | ---------------------------- |
| Setup    | Simple               | More setup                   |
| Chunking | Automatic            | Configurable                 |
| Search   | Managed              | Hybrid (vector + BM25)       |
| Control  | Less                 | Full control                 |
| Cost     | Included with Gemini | Separate service             |
| Best for | Prototypes           | Production with custom needs |

## Next Steps

<CardGroup cols={2}>
  <Card title="Full Example" icon="github" href="https://github.com/GetStream/vision-agents/tree/main/examples/03_phone_and_rag_example">
    Phone + RAG implementation
  </Card>

  <Card title="TurboPuffer Integration" icon="database" href="/integrations/infrastructure/turbopuffer">
    TurboPuffer plugin reference
  </Card>
</CardGroup>
