Having been immersed in the exploration of search technologies for nearly ten years, I can assert with confidence that nothing has matched the transformative impact of the recent emergence of Retrieval Augmented Generation (RAG). This framework is revolutionizing the domain of search and information retrieval through the application of vector search alongside generative AI, delivering precise responses based on reliable data.

In the course of my search initiatives, my experimentation with RAG has prompted me to contemplate its possible enhancements; I am convinced that RAG, in its current state, is insufficient to fulfill the requirements of users and necessitates advancement.

Do not be mistaken, RAG is superior and undeniably represents a significant advancement in the field of information retrieval technologies. Since the launch of GPT-2 in 2021, my adoption of RAG has significantly improved my ability to find vital information from either my personal archives or professional documents. RAG offers several benefits:

Integration of Vector Search: RAG introduces an innovative method by combining vector search capabilities with generative models. This integration enables the production of responses from large language models (LLMs) that are more refined and contextually aware.
Reduction of Hallucination: RAG markedly diminishes the propensity of LLMs to generate hallucinated text, ensuring that the output is more reliably grounded in facts.
Applicability for Personal and Professional Uses: From personal tasks like sifting through notes to more professional endeavors, RAG proves its adaptability in enhancing efficiency and improving the quality of content while depending on a trustworthy data source.

However, I am progressively identifying more drawbacks of RAG:

Limitations Inherent to Current Search Technologies: RAG is subject to the same shortcomings that plague our current retrieval-based lexical and vector search technologies.
Inefficiencies in Human Searching: Humans struggle with effectively communicating their needs to search systems, with problems such as typographical mistakes, vague queries, or limited vocabulary often leading to missed valuable information that lies beyond the most immediate search results. Although RAG offers aid, it hasn’t completely solved this problem.
Simplification of Search Processes: The dominant search model simplistically connects queries with answers, lacking the ability to understand the complex, multi-faceted nature of human questions. This basic model often fails to capture the nuances and contexts of more complex user queries, producing less relevant results.

So, what actions can we undertake to solve these problems? We require a system that not only retrieves what we request but also understands the subtleties behind our inquiries without the need for increasingly advanced LLMs. Acknowledging these challenges and motivated by the potential, I devised a more sophisticated solution: RAG-Fusion.

Why RAG-Fusion?

Bridging Gaps: It addresses the limitations present in RAG by creating multiple user queries and reranking the outcomes.
Improved Search: Employs Reciprocal Rank Fusion and tailored vector score weighting for thorough, precise findings.

RAG-Fusion aims to close the divide between what users explicitly inquire and what they implicitly seek, moving closer to discovering the transformative knowledge that often remains concealed.

Starting this voyage with RAG years back, I wish I had disclosed those early experiments. Now, it’s time to rectify that. We’re going to explore the technical details of RAG-Fusion thoroughly.

A Thorough Examination of RAG-Fusion’s Inner Workings

Technologies and Toolkit

For those eager to dive directly into code and begin experimenting with RAG-Fusion, visit the GitHub repository here.

The core trio of RAG Fusion mirrors that of RAG, utilizing the same three essential technologies:

A versatile programming language, typically Python.
A specialized vector search database, like Elasticsearch or Pinecone, directing the document retrieval process.
A powerful large language model, such as ChatGPT, generating the text.

A graphical depiction of RAG-Fusion’s operational process. Image created by the author.

Yet, RAG-Fusion sets itself apart with several additional steps — query creation and a reranking process.

RAG-Fusion’s Procedure:

Query Replication with a Modification: Convert a user’s query into similar, yet unique queries through an LLM.
Vector Search Executed: Conduct vector searches for both the original query and the newly created query variants.
Sophisticated Reranking: Compile and refine all the findings using reciprocal rank fusion.
Graceful Conclusion: Merge the top results with the new queries, directing the large language model towards an output that incorporates all queries and the reranked results list.

RAG-Fusion Code Sample. Image created by the author.

We will delve into each of these stages in greater depth.

Generation of Multiple Queries

The Rationale for Several Queries?

In conventional search mechanisms, users typically enter a single query to locate information. Although this method is direct, it has its drawbacks. A lone query might not fully encompass the user’s interest scope or could be too specific to retrieve a wide array of results. Generating multiple queries from varied viewpoints addresses this issue.

Technical Execution (Prompt Engineering)

Diagram Illustrating the Process of Generating Multiple Queries: Utilizing Prompt Engineering and Language Models to Expand Search Boundaries and Improve Result Relevance. Image created by the author.

The application of prompt engineering is essential in creating multiple queries that are not just similar to the original but also provide distinct perspectives or angles.

Here’s the procedure:

Language Model Function Invocation: This operation invokes a language model (here, chatGPT). It requires a detailed instruction, often termed a “system message,” which directs the model’s actions. For instance, in this scenario, the system message instructs the model to behave as an “AI assistant.”
Formulation of Natural Language Queries: Subsequently, the model generates several queries based on the initial query.
Variety and Extensiveness: The generated queries are strategically crafted to present various aspects of the original topic. For example, if the original query pertained to the “effects of climate change,” the new queries could explore dimensions like “economic impacts of climate change,” “climate change and health issues,” etc.

This method guarantees that the search activity encompasses a wider spectrum of information, enhancing the quality and comprehensiveness of the resultant summary.

Reciprocal Rank Fusion (RRF) is an approach for merging multiple lists of search results into a singular, consolidated ranking. Conceived jointly by the University of Waterloo (CAN) and Google, RRF, according to its creators, “delivers superior outcomes compared to any single system, and surpasses conventional” methods of reordering.

RRF employs an algorithm where k=60, as illustrated in — Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods

By amalgamating rankings from diverse queries, it enhances the likelihood that the most pertinent documents are showcased at the forefront of the ultimate list. RRF stands out because it depends not on the precise scores given by the search engines but on the comparative rankings, rendering it particularly apt for integrating results from queries with varying score scales or distributions.

RRF is commonly utilized to combine lexical and vector search outcomes. While this technique can compensate for the general lack of precision in vector search for specific terms, such as acronyms, the resultant blend often seems like a mismatch of different result sets since identical outcomes seldom emerge for the same query across both lexical and vector searches.

Envision RRF as that individual who seeks everyone’s viewpoint prior to making a decision. In this scenario, however, this trait is advantageous, not bothersome. The principle here is the more, the better — or, in this context, the more precise.

Technical Implementation

Reciprocal Rank Fusion Positional Reordering System. Illustration by the author.

The method reciprocal_rank_fusion accepts a dictionary of search outcomes, wherein each key represents a query, and the related value is a list of document IDs arranged by their relevance to that query. The RRF algorithm subsequently computes a novel score for each document based on its ranking across the various lists and organizes them to produce a final reorganized list.

Upon computing the merged scores, the method organizes the documents in an ascending order of these scores to derive the final reorganized list, which is then provided.

Generative Output

Preservation of User Intent

A hurdle in employing multiple queries is the potential weakening of the user’s initial intent. To counter this, we guide the model to assign greater emphasis to the original query in the engineering of prompts.

Technical Implementation

Subsequently, the reorganized documents alongside all queries are input into a LLM prompt to generate the generative output in a typical RAG fashion, such as requesting a response or summary.

With the integration of these methodologies and technologies, RAG Fusion presents a sophisticated, refined method for text generation. It capitalizes on the strengths of search technology and generative AI to yield outputs that are both of high quality and reliable.

Strengths and Shortcomings of RAG-Fusion

Strengths

1. Superior Source Material Quality

When you use RAG Fusion, the depth of your search isn’t merely ‘enhanced’ — it’s amplified. The reranked list of relevant documents means that you’re not just scraping the surface of information but diving into an ocean of perspectives. The structured output is easier to read and feels intuitively trustworthy, which is crucial in a world sceptical of AI-generated content.

2. Enhanced User Intent Alignment

At its core, RAG Fusion is designed to be an empathic AI that brings to light what users are striving to express but perhaps can’t articulate. Leveraging a multi-query strategy captures a multifaceted representation of the user’s informational needs, thus delivering holistic outputs and resonating with user intent.

3. Structured, Insightful Outputs

By drawing from a diverse set of sources, the model crafts well-organised and insightful answers, anticipating follow-up questions and preemptively addressing them.

4. Auto-Correcting User Queries

The system not only interprets but also refines user queries. Through the generation of multiple query variations, RAG Fusion performs implicit spelling and grammar checks, thereby enhancing search result accuracy

5. Navigating Complex Queries

Human language often falters when expressing intricate or specialised thoughts. The system acts as a linguistic catalyst, generating variations that may incorporate the jargon or terminologies required for more focused and relevant search results. It can also take longer, more complex queries and break them down into smaller, manageable chunks for the vector search.

6. Serendipity in Search

Consider the “unknown unknowns” — information you don’t know you need until you encounter it. RAG Fusion allows for this serendipitous discovery. By employing a broader query spectrum, the system engenders the likelihood of unearthing information that, while not explicitly sought, becomes a eureka moment for the user. This sets RAG Fusion apart from other traditional search models.

Challenges

1. The Risk of Being Overly Verbose

RAG-Fusion’s depth can sometimes lead to a deluge of information. Outputs might be detailed to the point of being overwhelming. Think of RAG-Fusion as that friend who over-explains things — informative, but occasionally, you might need them to get to the point.

2. Balancing the Context Window

The inclusion of multi-query input and a diversified document set can stress the language model’s context window. Picture a stage crowded with actors, making it challenging to follow the plot. For models with tight context constraints, this could lead to less coherent or even truncated outputs.

Ethical and User Experience Considerations

With great power comes great responsibility. And with RAG Fusion, the power to manipulate user queries to improve results feels like it’s crossing into some kind of moral grey zone. Balancing the improved search results with the integrity of user intent is crucial, and I’ve got some thoughts you should consider when implementing this solution:

Ethical Concerns:

User Autonomy: The manipulation of user queries can sometimes deviate from the original intent. It’s essential to consider how much control we’re ceding to AI and at what cost.
Transparency: It’s not just about better results; users should be aware if and how their queries are adjusted. This transparency is essential to maintain trust and respect user intent.

User Experience (UX) Enhancements:

Preserving Original Query: RAG Fusion prioritises the initial user query, ensuring its importance in the generative process. This acts as a safeguard against misinterpretations.
Visibility of Process: Displaying generated queries alongside final results provides users with a transparent look at the search’s scope and depth. It aids in building trust and understanding.

UX/UI Implementation Tips:

User Control: Offer users an option to toggle RAG Fusion, allowing them the choice between manual control and enhanced AI assistance.
Guidance & Clarity: A tooltip or brief explanation about RAG Fusion’s workings can help set clear user expectations.

If I had to encapsulate the value of RAG Fusion, it would be this: it moves us closer to what AI was always supposed to do — amplify human potential.

RAG Fusion is not merely an advancement; it’s a clarion call to all innovators. It beckons us to step beyond conventional frameworks and reimagine the tapestry of “search”.

To those in the realm of search, I pose a challenge: Let’s not merely create search systems; let’s architect interpreters of inquiry.

Hopefully, RAG-Fusion inspires you to take up that challenge with me.

Dive into the GitHub repo, get your hands dirty with the code, and join the revolution.

RAG with Rank Fusion