1. Easy RAG with Gemini
Gemini’s File Search is the easiest way to add RAG to your agent. It handles chunking, embedding, and retrieval automatically.Using the wrapper
- Store reuse: Automatically finds and reuses existing stores with the same name
- Content deduplication: Skips uploading files that already exist (via SHA-256 hash)
- Batch uploads: Uploads multiple files concurrently
2. RAG with Turbopuffer
Turbopuffer example
Here’s an example that uses Turbopuffer with vector & BM25 search.Understanding RAG
Sooner or later you’ll want full control over RAG. RAG can be pretty complex. Let’s go over what a typical RAG pipeline looks like:1. Gathering documents
First you have to gather documents from URLs, folders, images, PDFs, external APIs (Slack/Notion etc.)2. Parsing/enriching documents
Images, PDFs, URLs all need some parsing before they can be used. Tools like markdownify, Beautiful Soup, and WebBaseLoader come in handy for URLs. For OCR see the OCR benchmark: https://huggingface.co/spaces/ling99/OCRBench-v2-leaderboard3. Chunking & Contextual retrieval
Large documents need to be split into smaller chunks for effective retrieval. Common strategies:- Fixed size: Split every N characters with overlap
- Semantic: Split at sentence or paragraph boundaries
- Recursive: Try multiple separators (paragraphs → sentences → words)
4. Embedding
Next, you need some way to translate text to an embedding. An embedding is basically a vector representation of “text meaning” for an LLM. The leaderboard for embedding models is visible here: https://huggingface.co/spaces/mteb/leaderboard5. Vector database
Next, you want to store these embeddings in a vector database. One of the most innovative options in the space is Turbopuffer: https://turbopuffer.com/docs/hybrid6. Combined queries
The best practice is to combine full text and vector search. The Turbopuffer guide on hybrid search is a good starting point: https://turbopuffer.com/docs/hybrid It’s also common to use AI to create different variations of the original search query text: https://developers.llamaindex.ai/python/examples/query_transformations/query_transform_cookbook/7. Reranking
When you gather the results of vector and full text search you typically want to rerank (or summarize) the results.Advanced RAG example with Turbopuffer
The Turbopuffer RAG provides:- Hybrid search: Combines vector (semantic) and BM25 (keyword) search
- Reciprocal Rank Fusion: Merges results from both search methods
- Configurable chunking: Control chunk size and overlap
Choosing between Gemini and Turbopuffer
| Feature | Gemini File Search | Turbopuffer |
|---|---|---|
| Setup complexity | Simple | More setup |
| Chunking | Automatic | Configurable |
| Search type | Managed | Hybrid (vector + BM25) |
| Control | Less | Full control |
| Cost | Included with Gemini | Separate service |
| Best for | Quick prototypes | Production with custom needs |

