RAG (retrieval-augmented generation)

01

The analogy

It's the difference between a closed-book and an open-book exam. Without RAG, the model answers only from what it “memorized” during training. With RAG, before answering it consults your library — documents, manuals, databases — and writes its answer based on what it finds.

02

In detail

RAG combines a retriever with a generator: the question is converted into vectors, the most relevant chunks are retrieved from a document base (semantic search) and injected into the model's context so it answers grounded in them. It reduces hallucinations and lets you use private or recent information without retraining the model.

03

An example

An example Promptpedia

A support chatbot with RAG doesn't make up your refund policy: it finds your company's official document, reads it, and answers citing it.

04

Embeddings Hallucinations Context window