diff --git a/examples_notebooks/global_search/index.html b/examples_notebooks/global_search/index.html index 8f8ae714..ed39684b 100644 --- a/examples_notebooks/global_search/index.html +++ b/examples_notebooks/global_search/index.html @@ -2248,7 +2248,7 @@ report_df.head()
 ---------------------------------------------------------------------------
 AttributeError                            Traceback (most recent call last)
-/tmp/ipykernel_2010/1512985616.py in ?()
+/tmp/ipykernel_1968/1512985616.py in ?()
       2 entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
       3 report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
       4 entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")
diff --git a/examples_notebooks/global_search_with_dynamic_community_selection/index.html b/examples_notebooks/global_search_with_dynamic_community_selection/index.html
index 4ae51a13..1d367ab2 100644
--- a/examples_notebooks/global_search_with_dynamic_community_selection/index.html
+++ b/examples_notebooks/global_search_with_dynamic_community_selection/index.html
@@ -2156,7 +2156,7 @@ report_df.head()
 
 ---------------------------------------------------------------------------
 AttributeError                            Traceback (most recent call last)
-/tmp/ipykernel_2040/2760368953.py in ?()
+/tmp/ipykernel_1999/2760368953.py in ?()
       2 entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
       3 report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
       4 entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")
diff --git a/index/outputs/index.html b/index/outputs/index.html
index 4da511ff..d94d9992 100644
--- a/index/outputs/index.html
+++ b/index/outputs/index.html
@@ -1626,6 +1626,11 @@
 Leiden-generated cluster ID for the community. Note that these increment with depth, so they are unique through all levels of the community hierarchy. For this table, human_readable_id is a copy of the community ID rather than a plain increment.
 
 
+parent
+int
+Parent community ID.
+
+
 level
 int
 Depth of the community in the hierarchy.
@@ -1679,6 +1684,11 @@
 Short ID of the community this report applies to.
 
 
+parent
+int
+Parent community ID.
+
+
 level
 int
 Level of the community this report applies to.
diff --git a/search/search_index.json b/search/search_index.json
index 2c2ed91a..2bd75923 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config": {"lang": ["en"], "separator": "[\\s\\-]+", "pipeline": ["stopWordFilter"]}, "docs": [{"location": "", "title": "Welcome to GraphRAG", "text": "

\ud83d\udc49 Microsoft Research Blog Post \ud83d\udc49 GraphRAG Accelerator \ud83d\udc49 GraphRAG Arxiv

Figure 1: An LLM-generated knowledge graph built using GPT-4 Turbo.

GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. The GraphRAG process involves extracting a knowledge graph out of raw text, building a community hierarchy, generating summaries for these communities, and then leveraging these structures when perform RAG-based tasks.

To learn more about GraphRAG and how it can be used to enhance your LLMs ability to reason about your private data, please visit the Microsoft Research Blog Post.

"}, {"location": "#solution-accelerator", "title": "Solution Accelerator \ud83d\ude80", "text": "

To quickstart the GraphRAG system we recommend trying the Solution Accelerator package. This provides a user-friendly end-to-end experience with Azure resources.

"}, {"location": "#get-started-with-graphrag", "title": "Get Started with GraphRAG \ud83d\ude80", "text": "

To start using GraphRAG, check out the Get Started guide. For a deeper dive into the main sub-systems, please visit the docpages for the Indexer and Query packages.

"}, {"location": "#graphrag-vs-baseline-rag", "title": "GraphRAG vs Baseline RAG \ud83d\udd0d", "text": "

Retrieval-Augmented Generation (RAG) is a technique to improve LLM outputs using real-world information. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique, which we call Baseline RAG. GraphRAG uses knowledge graphs to provide substantial improvements in question-and-answer performance when reasoning about complex information. RAG techniques have shown promise in helping LLMs to reason about private datasets - data that the LLM is not trained on and has never seen before, such as an enterprise\u2019s proprietary research, business documents, or communications. Baseline RAG was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:

To address this, the tech community is working to develop methods that extend and enhance RAG. Microsoft Research\u2019s new approach, GraphRAG, uses LLMs to create a knowledge graph based on an input corpus. This graph, along with community summaries and graph machine learning outputs, are used to augment prompts at query time. GraphRAG shows substantial improvement in answering the two classes of questions described above, demonstrating intelligence or mastery that outperforms other approaches previously applied to private datasets.

"}, {"location": "#the-graphrag-process", "title": "The GraphRAG Process \ud83e\udd16", "text": "

GraphRAG builds upon our prior research and tooling using graph machine learning. The basic steps of the GraphRAG process are as follows:

"}, {"location": "#index", "title": "Index", "text": ""}, {"location": "#query", "title": "Query", "text": "

At query time, these structures are used to provide materials for the LLM context window when answering a question. The primary query modes are:

"}, {"location": "#prompt-tuning", "title": "Prompt Tuning", "text": "

Using GraphRAG with your data out of the box may not yield the best possible results. We strongly recommend to fine-tune your prompts following the Prompt Tuning Guide in our documentation.

"}, {"location": "blog_posts/", "title": "Microsoft Research Blog", "text": "