An introduction to RAG: Retrieval Augmented Generation explained

Do you ever find it frustrating when trying to get ChatGPT to provide specific responses, but it ends up giving you nonsense instead? Wouldn't it be great if you could give ChatGPT some relevant context and have it respond with something meaningful?

In this article, we'll explore the concept of RAG and its potential to address this very issue.

Cracking the code: what is RAG?

RAG, short for Retrieval-Augmented Generation, is a pioneering approach within the field of Natural Language Processing (NLP) that transforms how text is generated. Unlike traditional methods that involve training a language model (LLM) from scratch, the RAG framework revolutionizes text generation by seamlessly integrating existing knowledge with user prompts.

It is a technique that boosts the accuracy, precision, and reliability of generative AI models by incorporating information from external sources. Essentially, it addresses a gap in the function of LLMs.

Understanding its acronym is key to grasping its significance and diving into how it works. "Retrieval" involves accessing information from predefined data sources (i.e. databases, APIs, documents, etc), enriching the text generation process. "Augmented" denotes enhancing text generation by incorporating retrieved data, ensuring contextual understanding and relevance. Finally, "generation" represents the core function, where RAG synthesizes retrieved information and user input to produce coherent, contextually grounded text.

How does RAG work?: A step-by-step guide

RAG operates through a structured two-step process: retrieval and generation. Firstly, it retrieves relevant information from a data source based on user prompts, laying the groundwork for context establishment. This retrieved data then forms the basis for text generation. RAG utilizes this contextual information to generate text that aligns with the user's input and the retrieved data, ensuring coherence and relevance. Through this interplay of retrieval and generation, RAG transforms text generation by combining existing knowledge with user prompts, pushing the limits of natural language processing.

Step 0: organizing data

Data in RAG is sorted out similarly to books in a library, so information is easier to find. This is done by indexing the data, which means it's categorized based on exact matches of words, themes, or metadata like topic, author, date, or keywords. This facilitates RAG’s access to external information, thereby improving its response accuracy.

Data indexing usually happens regularly whenever new data is available, and it can be done through different methods:

Lexical indexing: indexing data by exact matches of words or phrases. It is fast and precise, but it can miss some relevant data that does not match exactly. For example if your query is “lasagna” it will find a match if your elements are [spaghetti, lasagna, dog]
Vector indexing: organizing information using numerical vectors, usually called embeddings, that represent the meaning of words or phrases. This type of indexing is slower and less precise, but it can find more relevant data that is not an exact match. For example if your query is “pasta” it will find a match if your elements are [spaghetti, lasagna, dog]
Hybrid indexing: the data is indexed using both exact matches and numerical vectors, combining the benefits of search and vector indexing. Hybrid indexing can improve the accuracy and diversity of data retrieval.

In general, most of the RAG pipelines store and split the documents into smaller chunks to then generate a vector (embedding) that represents what that chunk speaks about.

Step 1 & 2: Input query processing

This step involves refining the user's question to better match the indexed data. The query is simplified and optimized for effective search. Input query processing is crucial as it helps RAG find the most relevant data for the question.

When using vector indexes, the processed input query is embedded to later perform the search.

Step 3: Searching and ranking

Once your question is clear, RAG goes hunting for the best information to answer it. RAG looks through its indexed data and picks out the most relevant context. It uses different ways to search depending on how the data is organized. For vector searches, it usually computes the vector distance between the input query and the document chunks.

The results of the search provide a large number of options. However, not all of these results are suitable for LLM, so they need to be organized. You can compare this to how search engines like Google or Bing work. When you search for something, you may get multiple pages of links, but the key is to organize these results based on their relevance to the query and display the most relevant ones on the first page. RAG's search result ranking is a way to filter out irrelevant data from the search stage. It assigns a score to each output based on how well it matches the query and selects only the highest-scoring ones for the generation stage.

Step 4: Prompt augmentation

Upon identifying the best pieces of information, RAG integrates them back into the original question to enhance the prompt. This provides RAG with additional context to better understand and respond to the query, as it ensures its answer isn't just based on pre-existing knowledge but also tailored with up-to-date and specific information.

Step 5: Response generation

Finally, the LLM plays a crucial role by using the enhanced question made earlier to create a response. This is where the true power of AI is revealed. The LLM, with its advanced language skills, now has a prompt with more relevant information—the augmentation. This is not just any answer; it's an answer grounded on the specific, current data obtained earlier.

Advantages of RAG: unlocking its key benefits

Cost-effectiveness: Training a language model from scratch demands substantial computational resources, such as Graphic Processing Units (GPUs), which are computer chips generally used to render graphics. RAG, on the other hand, can run on a standard PC, making it a more cost-effective solution for text generation tasks.
Time efficiency: RAG reduces the time required for text generation tasks. While training a language model from scratch can take up to a month, RAG can generate responses in a matter of seconds, provided the relevant documents are already available in a database.
Dynamic updating: One of the standout features of RAG is its ability to incorporate new information efficiently. By simply adding new documents to the database, RAG can update its knowledge base without the need for retraining, ensuring that the generated text remains current and relevant.
Precision and reliability: Unlike some language models that may "hallucinate" or generate inaccurate information, RAG tends to be more precise and reliable. Since it operates within the constraints of the provided context, it avoids generating nonsense or irrelevant responses.

Limitations of RAG: recognizing its challenges

Prompt dependency: The effectiveness of RAG depends on the specificity and clarity of the provided prompts. Users must be precise and explicit in their instructions to ensure accurate and relevant text generation. This dependency on prompts can lead to tangential responses if the prompt is not well constructed.
Dependence on database quality: Just like the dependency on the prompt provided, RAG also relies on the quantity and quality of the database from which it retrieves information. A database that lacks accurate information can negatively impact the relevance of the generated text.
Limited creativity: Since RAG hinges on existing data, it may struggle to generate truly creative content if that’s what you’re aiming for, for instance, in the case of creative writing on content creation.

Exploring the differences: Retrieval-Augmented Generation vs. Semantic Search

Retrieval-Augmented Generation (RAG) and Semantic Search are related but distinct concepts in the realm of natural language processing (NLP) and information retrieval.

On the one hand, RAG acts as a super-smart assistant in seeking information, retrieving answers from various sources, and amalgamating them to provide comprehensive responses. It finds applications in question-answering systems, chatbots, and content-generation tasks where incorporating external knowledge enhances response quality.

Semantic search, on the other hand, interprets the meaning behind user queries rather than merely matching keywords. It employs natural language understanding techniques to understand query intent and retrieve semantically related documents or resources. Semantic search systems enhance search result relevance by analyzing semantics and prioritizing token words.

While both systems involve retrieving information based on user queries, RAG specifically focuses on generating text responses by incorporating retrieved information, while semantic search aims to improve the relevance of search results by understanding the semantics of user queries and documents.

The promise of RAG in text generation

RAG represents a groundbreaking approach to text generation, offering a cost-effective, efficient, and reliable alternative to training language models from scratch. Despite its limitations, particularly regarding prompt dependency, RAG’s advantages make it a compelling choice for various NLP tasks. As the field of natural language processing continues to evolve, RAG stands out as a powerful tool with the potential to revolutionize text generation workflows. We highly recommend it as a more efficient and economical solution that is also easier to maintain and continuously update.

At Eagerworks, we integrate RAG into various projects, such as our latest in-house endeavor, Docs Hunter. Here, RAG streamlines the text generation process by leveraging existing data and databases provided by users while maintaining precision and relevance. You can read more about the functionalities of Docs Hunter in this article. Should you find yourself overwhelmed with documentation and need assistance in organizing and accessing it more efficiently, don't hesitate to reach out to us for support.

Furthermore, if you wish to stay updated on software evolution, check out our blog and boost your knowledge of the latest tech trends and updates.

Stay updated!

Juan Pablo Balarini

February 27, 2024