RAG for Agents

Give your agents access to documents, URLs, and knowledge bases using Retrieval-Augmented Generation (RAG).

Vision Agents requires a Stream account for real-time transport.

Options

Option	Best For	Complexity
Gemini File Search	Quick setup, automatic chunking/embedding	Simple
TurboPuffer	Full control, hybrid search, production	More setup

Gemini File Search

Gemini’s File Search handles chunking, embedding, and retrieval automatically.

from vision_agents.plugins import gemini

# Create and populate a file search store
store = gemini.GeminiFilesearchRAG(name="my-knowledge-base")
await store.create()  # Reuses existing store if found
await store.add_directory("./knowledge")  # Skips duplicates via content hash

# Use with Gemini LLM
llm = gemini.LLM(
    model="gemini-2.5-flash",
    tools=[gemini.tools.FileSearch(store)]
)

Features:

Store reuse (finds existing stores by name)
Content deduplication via SHA-256 hash
Concurrent batch uploads

TurboPuffer

TurboPuffer provides hybrid search combining vector (semantic) and BM25 (keyword) search with Reciprocal Rank Fusion.

from vision_agents.plugins import turbopuffer, gemini

# Initialize with hybrid search
rag = turbopuffer.TurboPufferRAG(
    namespace="my-knowledge",
    chunk_size=10000,
    chunk_overlap=200,
)
await rag.add_directory("./knowledge")

# Register as function for LLM
llm = gemini.LLM("gemini-2.5-flash")

@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
    return await rag.search(query, top_k=5, mode="hybrid")

RAG Pipeline Overview

For custom implementations, a typical RAG pipeline involves:

Document gathering — URLs, folders, PDFs, external APIs
Parsing — Convert to text (markdownify, BeautifulSoup, OCR)
Chunking — Split into retrievable pieces (fixed size, semantic, recursive)
Embedding — Convert text to vectors (MTEB leaderboard)
Vector storage — Store embeddings for similarity search
Hybrid search — Combine vector + full-text search (TurboPuffer guide)
Reranking — Score and filter results before passing to LLM

Comparison

Feature	Gemini File Search	TurboPuffer
Setup	Simple	More setup
Chunking	Automatic	Configurable
Search	Managed	Hybrid (vector + BM25)
Control	Less	Full control
Cost	Included with Gemini	Separate service
Best for	Prototypes	Production with custom needs

How-to Guides

Options

Gemini File Search

TurboPuffer

RAG Pipeline Overview

Comparison

Next Steps

Full Example

TurboPuffer Integration

How-to Guides

​Options

​Gemini File Search

​TurboPuffer

​RAG Pipeline Overview

​Comparison

​Next Steps

Full Example

TurboPuffer Integration

Options

Gemini File Search

TurboPuffer

RAG Pipeline Overview

Comparison

Next Steps