Back to the wiki

RAG (retrieval-augmented generation)

The model stops memorizing and starts looking things up.

The analogy

It's the difference between a closed-book and an open-book exam. Without RAG, the model answers only from what it “memorized” during training. With RAG, before answering it consults your library — documents, manuals, databases — and writes its answer based on what it finds.

In detail

RAG combines a retriever with a generator: the question is converted into vectors, the most relevant chunks are retrieved from a document base (semantic search) and injected into the model's context so it answers grounded in them. It reduces hallucinations and lets you use private or recent information without retraining the model.

An example

A support chatbot with RAG doesn't make up your refund policy: it finds your company's official document, reads it, and answers citing it.

Related concepts