Definition
RAG lets an AI look up relevant documents from your own knowledge base and answer using them, instead of relying only on what it memorized in training.
At a glance
- A retriever finds relevant text; a generator (the language model) writes the answer using it[2].
- Answers are grounded in real source material, so the system can cite where each fact came from[4].
- You update knowledge by changing the documents — no costly model retraining[3].
- By 2025, roughly 30 to 60 percent of enterprise AI use cases ran on RAG[1].
How it works
RAG has two phases[5]. First, your documents are split into chunks and stored as numerical “vectors” in a vector database such as Pinecone or pgvector[6]. Then, at question time, the system finds the most relevant chunks, adds them to the prompt, and the model answers from that context[7]. Retrieval quality matters more than raw model size: better retrievers can lift answer accuracy by 9 to 19 points[8].
Where it’s used
Customer-support and internal knowledge bots, with results limited to what each user is allowed to see[9]; legal and financial research where citations are the deliverable[1]; and code assistants that read a company’s own repositories.
RAG vs fine-tuning
They solve different problems and pair well[10]. Use RAG when facts change often or you need citations. Use fine-tuning to change the model’s tone, format, or vocabulary — not its facts.
Bottom line
RAG is the default way to build AI over private or fast-changing information: cheaper than retraining, citable, and only as good as the documents and retrieval behind it.