GraphRAG 2.0 Boosts AI Search Accuracy

Microsoft has recently rolled out an update to GraphRAG, enhancing its AI search engines’ ability to generate more specific, comprehensive answers. This update aims to refine the AI’s search capabilities by providing users with more accurate and detailed responses. Additionally, it reduces the resources required to process large language models (LLMs), allowing for more efficient and faster search results.

One of the key benefits of this update is the noticeable speed increase in LLM processing. The improved efficiency ensures that AI search engines can generate responses more quickly without compromising the quality of the results. The accuracy of the information provided has also been enhanced, ensuring that users receive more reliable and relevant content when using AI search tools.

Although Microsoft has not officially assigned a version number to the update, the improvements are so significant that the industry has taken to referring to it as GraphRAG 2.0. This distinction helps to highlight the vast changes and better performance compared to the original version. The enhancements are particularly evident in the way the AI handles data and processes queries, resulting in a more refined user experience.

The update also introduces a better focus on referencing source materials, adding credibility to the AI-generated answers. By providing more references to original sources, GraphRAG 2.0 aims to make the search results more transparent and verifiable. This change aligns with the growing demand for AI-driven tools that offer not just accuracy but also accountability and transparency in sourcing the information provided.

The Difference Between RAG And GraphRAG

RAG (Retrieval Augmented Generation) is an advanced AI technique that merges the capabilities of a large language model (LLM) with a search index or database to provide accurate responses to user queries. By integrating a search index, the language model is grounded in relevant and up-to-date data, which significantly reduces the risk of the AI providing outdated or erroneous answers. This grounding process ensures that the AI-powered search engine offers more reliable and accurate results by incorporating real-time information. Essentially, RAG makes it possible for AI systems to pull from live, dynamic data sources, enhancing the quality and relevance of the generated answers.

Building on this foundation, GraphRAG refines the concept of RAG even further. Unlike traditional RAG, which relies on a basic search index, GraphRAG uses a knowledge graph built from the search index to deliver more structured and insightful responses. This knowledge graph maps out the relationships between various pieces of data, allowing GraphRAG to provide users with more comprehensive and contextually rich answers. These answers are presented as community reports, which summarise key points and highlight the most pertinent information, often incorporating sources and references.

The community reports generated by GraphRAG offer a significant enhancement over previous models by providing not only detailed answers but also ensuring that users receive clear, easily digestible summaries. By organising and linking related data, GraphRAG can help users navigate complex topics more effectively, offering both depth and clarity. This capability allows for a more interactive and informative search experience, making it easier for users to access highly specific and relevant information without sifting through multiple sources.

GraphRAG’s ability to combine a knowledge graph with a search index improves the overall efficiency of AI systems in generating answers. This improvement leads to a faster, more accurate response to user queries, enabling search engines to deliver high-quality information with reduced computational resources. By reducing the possibility of hallucinated or irrelevant answers, GraphRAG enhances the user experience, ensuring that the AI-generated content remains both trustworthy and informative. This innovation marks a significant leap forward in the evolution of AI search technologies, offering users an advanced tool for obtaining reliable and well-rounded information quickly.

GraphRAG Uses A Two-Step Process:

Step 1: Indexing Engine

The first step in the GraphRAG process involves the indexing engine, which segments the search index into thematic communities that revolve around related topics. These communities are organised and connected through entities such as people, places, or concepts, as well as the relationships between these entities. This structure forms a hierarchical knowledge graph. Once these thematic communities are created, the large language model (LLM) generates a summary for each community, known as a Community Report. This hierarchical knowledge graph is an essential feature of GraphRAG, with each level of the structure providing a summarised view of the community it represents.

A common misconception about GraphRAG is that it relies solely on existing knowledge graphs. While it is true that knowledge graphs play a part in its functioning, the crucial distinction is that GraphRAG doesn’t just use pre-existing graphs. Instead, it creates its own knowledge graphs from unstructured data, such as web pages, during the Indexing Engine step. This transformation of raw, unstructured data into structured knowledge is what differentiates GraphRAG from traditional Retrieval Augmented Generation (RAG). RAG focuses on retrieving and summarising information, but it does not build a hierarchical graph to organise the data, which is a core advantage of GraphRAG’s method.

Step 2: Query Step

The second step in the GraphRAG process involves using the knowledge graph created in the first step to provide context to the LLM. This context allows the LLM to generate more accurate and relevant responses to queries. Microsoft highlights a limitation of traditional RAG, noting that it struggles to retrieve information based on topics because it relies solely on semantic relationships between terms, which might not always be sufficient to identify relevant answers.

GraphRAG improves upon this by using the knowledge graph to organise the data thematically. This means that GraphRAG can not only locate information based on semantic relationships, but also based on thematic similarity. By structuring documents in the search index into a knowledge graph, with topics and subtopics organised into layers of increasing specificity, GraphRAG can identify answers even when the exact keywords related to the query are not present in the document. This thematic approach allows GraphRAG to outperform RAG in accurately retrieving answers, ensuring that it provides more precise and contextually relevant information to users.

The original GraphRAG announcement explains the difference between baseline RAG and GraphRAG as follows:

“Baseline RAG struggles with queries that require the aggregation of information across the dataset to generate a comprehensive answer. For example, queries like ‘What are the top 5 themes in the data?’ tend to perform poorly because baseline RAG relies on vector search, which looks for semantically similar text content within the dataset. This method fails when there is no clear direction within the query to guide it to the correct information.”

In contrast, GraphRAG can effectively address such queries. The key difference lies in the structure of the LLM-generated knowledge graph, which gives insights into the overall structure and themes of the dataset. This enables the dataset to be organised into meaningful semantic clusters that are pre-summarised. When responding to a user query, the LLM can draw upon these clusters to summarise the relevant themes, offering more accurate and comprehensive answers. This structural approach is what sets GraphRAG apart from its predecessor, improving its ability to handle complex, aggregated queries.

Update To GraphRAG

To recap, GraphRAG is designed to create a knowledge graph from the search index. In this context, a “community” refers to a group of related segments or documents that are clustered based on topical similarity. A “community report” is the summary generated by the LLM for each community, providing a concise overview of the clustered information.

The original version of GraphRAG had its inefficiencies. It processed all community reports, even those that were irrelevant to the search query. This included lower-level summaries that didn’t contribute meaningfully to answering the query. Microsoft describes this earlier approach as “static,” as it lacked the dynamic filtering needed to focus only on relevant information.

The updated version of GraphRAG introduces a new feature known as “dynamic community selection.” This improvement evaluates the relevance of each community report, ensuring that irrelevant reports and their sub-communities are removed from the process. As a result, this update enhances both the efficiency and precision of the system by only considering the information that is pertinent to the search query.

Microsoft explains the process in more detail: “We introduce dynamic community selection to the global search algorithm, which utilises the knowledge graph structure of the indexed dataset. Starting from the root of the knowledge graph, we use an LLM to assess how relevant a community report is in answering the user’s question. If the report is deemed irrelevant, we simply remove it and its nodes (or sub-communities) from the search process. If it’s relevant, we continue by traversing down its child nodes and repeating the operation. Ultimately, only relevant reports are passed on to the map-reduce operation to generate the response to the user.”

Takeaways: Results Of Updated GraphRAG

Microsoft recently tested the updated version of GraphRAG and found that it led to a significant 77% reduction in computational costs, particularly the token cost when processed by the LLM. Tokens are the fundamental units of text processed by LLMs, and this improvement means that the newer version of GraphRAG can work more efficiently, using fewer resources. The updated system can also utilise a smaller LLM, which further reduces costs without compromising the quality of the results.

The updated version of GraphRAG has had several positive impacts on the quality of search results. One of the key improvements is that the dynamic search feature provides responses that are more specific and relevant to the query at hand. This ensures that the information delivered is more accurate and helpful for users.

In addition, the responses now include more references to source materials, which enhances the credibility of the information provided. This is a crucial factor for maintaining trust in the search results, especially when it comes to providing users with reliable and verifiable answers.

Moreover, the results generated by the updated system are more comprehensive and focused on the user’s specific query, helping to avoid overwhelming users with excessive or irrelevant information. The dynamic community selection feature in GraphRAG further improves the quality of search results by ensuring that only the most relevant and well-supported responses are provided, backed by appropriate source material.