Technology

RAG pipeline

The RAG pipeline dynamically connects a Large Language Model (LLM) with an external, up-to-date knowledge base (e.g., a vector database) to retrieve specific context and generate accurate, grounded responses.

RAG (Retrieval-Augmented Generation) is a two-stage architecture: Indexing and Retrieval. The Indexing phase processes raw data (e.g., PDFs, internal documents), chunks it, converts those chunks into high-dimensional vector embeddings, and stores them in a vector database (e.g., Pinecone). The Retrieval phase executes at query time: it embeds the user's question, performs a semantic search against the vector database for the top $K$ relevant chunks, and injects that retrieved context into the LLM's prompt. This process grounds the LLM's output in verifiable, domain-specific data, effectively mitigating hallucinations and ensuring real-time accuracy beyond the model's original training cutoff date.

https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/

1 project · 1 city

Related technologies

Custom LLMs 1 Manufacturing AI 1 PDF extraction 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

BoWatt: Custom LLMs in Manufacturing

Munich Nov 21

RAG pipeline PDF extraction