technicals

What is RAG?

May 28, 2026 · 5 min read

How retrieval-augmented generation works Query user question Retriever embeds & searches Context top-k passages LLM generates answer Grounded answer Knowledge base vector index · docs 1. embed query → ANN search 2. return top-k passages as context Retrieval-augmented generation grounds the model in external, citable evidence

Definition

RAG lets an AI look up relevant documents from your own knowledge base and answer using them, instead of relying only on what it memorized in training.

At a glance

How it works

RAG has two phases[5]. First, your documents are split into chunks and stored as numerical “vectors” in a vector database such as Pinecone or pgvector[6]. Then, at question time, the system finds the most relevant chunks, adds them to the prompt, and the model answers from that context[7]. Retrieval quality matters more than raw model size: better retrievers can lift answer accuracy by 9 to 19 points[8].

Where it’s used

Customer-support and internal knowledge bots, with results limited to what each user is allowed to see[9]; legal and financial research where citations are the deliverable[1]; and code assistants that read a company’s own repositories.

RAG vs fine-tuning

They solve different problems and pair well[10]. Use RAG when facts change often or you need citations. Use fine-tuning to change the model’s tone, format, or vocabulary — not its facts.

Bottom line

RAG is the default way to build AI over private or fast-changing information: cheaper than retraining, citable, and only as good as the documents and retrieval behind it.

Connects to Computer Science

References

  1. Enterprise RAG Predictions for 2025 — Eva Nahari. Vectara www.vectara.com
  2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela. arXiv arxiv.org
  3. Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models. Meta AI ai.meta.com
  4. What is retrieval-augmented generation? — Kim Martineau IBM Research research.ibm.com
  5. Build a Retrieval Augmented Generation (RAG) App. LangChain docs.langchain.com
  6. Vector Databases for RAG: Comparing pgvector, Pinecone, Chroma, and Weaviate. CallSphere callsphere.ai
  7. Retrieval Augmented Generation — Amazon SageMaker AI Developer Guide. Amazon Web Services docs.aws.amazon.com
  8. Dense Passage Retrieval for Open-Domain Question Answering — Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih. arXiv arxiv.org
  9. What is Retrieval-Augmented Generation (RAG)? Amazon Web Services aws.amazon.com
  10. RAG vs. fine-tuning. IBM www.ibm.com