Update docs for 2.0+ (#1984)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled

* Update docs

* Fix prompt links
This commit is contained in:
Nathan Evans 2025-06-23 13:49:47 -07:00 committed by GitHub
parent 1df89727c3
commit 27c6de846f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 24 additions and 24 deletions

View File

@ -1,6 +1,5 @@
# GraphRAG # GraphRAG
👉 [Use the GraphRAG Accelerator solution](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/)<br/> 👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/)<br/>
👉 [Read the docs](https://microsoft.github.io/graphrag)<br/> 👉 [Read the docs](https://microsoft.github.io/graphrag)<br/>
👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130) 👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130)
@ -28,7 +27,7 @@ To learn more about GraphRAG and how it can be used to enhance your LLM's abilit
## Quickstart ## Quickstart
To get started with the GraphRAG system we recommend trying the [Solution Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) package. This provides a user-friendly end-to-end experience with Azure resources. To get started with the GraphRAG system we recommend trying the [command line quickstart](https://microsoft.github.io/graphrag/get_started/).
## Repository Guidance ## Repository Guidance

View File

@ -12,6 +12,12 @@ There are five surface areas that may be impacted on any given release. They are
> TL;DR: Always run `graphrag init --path [path] --force` between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary. > TL;DR: Always run `graphrag init --path [path] --force` between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.
# v2
Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v2.ipynb) to convert older tables to the v2 format.
The v2 release renamed all of our index tables to simply name the items each table contains. The previous naming was a leftover requirement of our use of DataShaper, which is no longer necessary.
# v1 # v1
Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v1.ipynb) to convert older tables to the v1 format. Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v1.ipynb) to convert older tables to the v1 format.

View File

@ -4,7 +4,7 @@ As of version 1.3, GraphRAG no longer supports a full complement of pre-built en
The only standard environment variable we expect, and include in the default settings.yml, is `GRAPHRAG_API_KEY`. If you are already using a number of the previous GRAPHRAG_* environment variables, you can insert them with template syntax into settings.yml and they will be adopted. The only standard environment variable we expect, and include in the default settings.yml, is `GRAPHRAG_API_KEY`. If you are already using a number of the previous GRAPHRAG_* environment variables, you can insert them with template syntax into settings.yml and they will be adopted.
> **The environment variables below are documented as an aid for migration, but they WILL NOT be read unless you use template syntax in your settings.yml.** > **The environment variables below are documented as an aid for migration, but they WILL NOT be read unless you use template syntax in your settings.yml. We also WILL NOT be updating this page as the main config object changes.**
--- ---

View File

@ -40,7 +40,7 @@ models:
#### Fields #### Fields
- `api_key` **str** - The OpenAI API key to use. - `api_key` **str** - The OpenAI API key to use.
- `auth_type` **api_key|managed_identity** - Indicate how you want to authenticate requests. - `auth_type` **api_key|azure_managed_identity** - Indicate how you want to authenticate requests.
- `type` **openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use. - `type` **openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use.
- `model` **str** - The model name. - `model` **str** - The model name.
- `encoding_model` **str** - The text encoding model to use. Default is to use the encoding model aligned with the language model (i.e., it is retrieved from tiktoken if unset). - `encoding_model` **str** - The text encoding model to use. Default is to use the encoding model aligned with the language model (i.e., it is retrieved from tiktoken if unset).
@ -73,16 +73,18 @@ models:
### input ### input
Our pipeline can ingest .csv, .txt, or .json data from an input folder. See the [inputs page](../index/inputs.md) for more details and examples. Our pipeline can ingest .csv, .txt, or .json data from an input location. See the [inputs page](../index/inputs.md) for more details and examples.
#### Fields #### Fields
- `type` **file|blob** - The input type to use. Default=`file` - `storage` **StorageConfig**
- `type` **file|blob|cosmosdb** - The storage type to use. Default=`file`
- `base_dir` **str** - The base directory to write output artifacts to, relative to the root.
- `connection_string` **str** - (blob/cosmosdb only) The Azure Storage connection string.
- `container_name` **str** - (blob/cosmosdb only) The Azure Storage container name.
- `storage_account_blob_url` **str** - (blob only) The storage account blob URL to use.
- `cosmosdb_account_blob_url` **str** - (cosmosdb only) The CosmosDB account blob URL to use.
- `file_type` **text|csv|json** - The type of input data to load. Default is `text` - `file_type` **text|csv|json** - The type of input data to load. Default is `text`
- `base_dir` **str** - The base directory to read input from, relative to the root.
- `connection_string` **str** - (blob only) The Azure Storage connection string.
- `storage_account_blob_url` **str** - The storage account blob URL to use.
- `container_name` **str** - (blob only) The Azure Storage container name.
- `encoding` **str** - The encoding of the input file. Default is `utf-8` - `encoding` **str** - The encoding of the input file. Default is `utf-8`
- `file_pattern` **str** - A regex to match input files. Default is `.*\.csv$`, `.*\.txt$`, or `.*\.json$` depending on the specified `file_type`, but you can customize it if needed. - `file_pattern` **str** - A regex to match input files. Default is `.*\.csv$`, `.*\.txt$`, or `.*\.json$` depending on the specified `file_type`, but you can customize it if needed.
- `file_filter` **dict** - Key/value pairs to filter. Default is None. - `file_filter` **dict** - Key/value pairs to filter. Default is None.

View File

@ -6,7 +6,6 @@
To get started with the GraphRAG system, you have a few options: To get started with the GraphRAG system, you have a few options:
👉 [Use the GraphRAG Accelerator solution](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
👉 [Install from pypi](https://pypi.org/project/graphrag/). <br/> 👉 [Install from pypi](https://pypi.org/project/graphrag/). <br/>
👉 [Use it from source](developing.md)<br/> 👉 [Use it from source](developing.md)<br/>

View File

@ -1,7 +1,6 @@
# Welcome to GraphRAG # Welcome to GraphRAG
👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/) <br/> 👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/) <br/>
👉 [GraphRAG Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130) 👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130)
<p align="center"> <p align="center">
@ -16,10 +15,6 @@ approaches using plain text snippets. The GraphRAG process involves extracting a
To learn more about GraphRAG and how it can be used to enhance your language model's ability to reason about your private data, please visit the [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/). To learn more about GraphRAG and how it can be used to enhance your language model's ability to reason about your private data, please visit the [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/).
## Solution Accelerator 🚀
To quickstart the GraphRAG system we recommend trying the [Solution Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) package. This provides a user-friendly end-to-end experience with Azure resources.
## Get Started with GraphRAG 🚀 ## Get Started with GraphRAG 🚀
To start using GraphRAG, check out the [_Get Started_](get_started.md) guide. To start using GraphRAG, check out the [_Get Started_](get_started.md) guide.

View File

@ -52,7 +52,7 @@ workflows: [create_communities, create_community_reports, generate_text_embeddin
### FastGraphRAG ### FastGraphRAG
[FastGraphRAG](./methods.md#fastgraphrag) uses text_units for the community reports instead of the entity and relationship descriptions. If your graph is sourced in such a way that it does not have descriptions, this might be a useful alternative. In this case, you would update your workflows list to include the text variant: [FastGraphRAG](./methods.md#fastgraphrag) uses text_units for the community reports instead of the entity and relationship descriptions. If your graph is sourced in such a way that it does not have descriptions, this might be a useful alternative. In this case, you would update your workflows list to include the text variant of the community reports workflow:
```yaml ```yaml
workflows: [create_communities, create_community_reports_text, generate_text_embeddings] workflows: [create_communities, create_community_reports_text, generate_text_embeddings]
@ -65,7 +65,6 @@ This method requires that your entities and relationships tables have valid link
Putting it all together: Putting it all together:
- `input`: GraphRAG does require an input document set, even if you don't need us to process it. You can create an input folder and drop a dummy.txt document in there to work around this.
- `output`: Create an output folder and put your entities and relationships (and optionally text_units) parquet files in it. - `output`: Create an output folder and put your entities and relationships (and optionally text_units) parquet files in it.
- Update your config as noted above to only run the workflows subset you need. - Update your config as noted above to only run the workflows subset you need.
- Run `graphrag index --root <your project root>` - Run `graphrag index --root <your project root>`

View File

@ -10,7 +10,7 @@ This is the method described in the original [blog post](https://www.microsoft.c
- relationship extraction: LLM is prompted to describe the relationship between each pair of entities in each text unit. - relationship extraction: LLM is prompted to describe the relationship between each pair of entities in each text unit.
- entity summarization: LLM is prompted to combine the descriptions for every instance of an entity found across the text units into a single summary. - entity summarization: LLM is prompted to combine the descriptions for every instance of an entity found across the text units into a single summary.
- relationship summarization: LLM is prompted to combine the descriptions for every instance of a relationship found across the text units into a single summary. - relationship summarization: LLM is prompted to combine the descriptions for every instance of a relationship found across the text units into a single summary.
- claim extraction (optiona): LLM is prompted to extract and describe claims from each text unit. - claim extraction (optional): LLM is prompted to extract and describe claims from each text unit.
- community report generation: entity and relationship descriptions (and optionally claims) for each community are collected and used to prompt the LLM to generate a summary report. - community report generation: entity and relationship descriptions (and optionally claims) for each community are collected and used to prompt the LLM to generate a summary report.
`graphrag index --method standard`. This is the default method, so the method param can actual be omitted. `graphrag index --method standard`. This is the default method, so the method param can actual be omitted.
@ -23,7 +23,7 @@ FastGraphRAG is a method that substitutes some of the language model reasoning f
- relationship extraction: relationships are defined as text unit co-occurrence between entity pairs. There is no description. - relationship extraction: relationships are defined as text unit co-occurrence between entity pairs. There is no description.
- entity summarization: not necessary. - entity summarization: not necessary.
- relationship summarization: not necessary. - relationship summarization: not necessary.
- claim extraction (optiona): unused. - claim extraction (optional): unused.
- community report generation: The direct text unit content containing each entity noun phrase is collected and used to prompt the LLM to generate a summary report. - community report generation: The direct text unit content containing each entity noun phrase is collected and used to prompt the LLM to generate a summary report.
`graphrag index --method fast` `graphrag index --method fast`
@ -41,4 +41,4 @@ You can install it manually by running `python -m spacy download <model_name>`,
## Choosing a Method ## Choosing a Method
Standard GraphRAG provides a rich description of real-world entities and relationships, but is more expensive that FastGraphRAG. We estimate graph extraction to constitute roughly 75% of indexing cost. FastGraphRAG is therefore much cheaper, but the tradeoff is that the extracted graph is less directly relevant for use outside of GraphRAG, and the graph tends to be quite a bit noisier. If high fidelity entities and graph exploration are important to your use case, we recommend staying with traditional GraphRAG. If your use case is primarily aimed at summary questions using global search, FastGraphRAG is a reasonable and cheaper alternative. Standard GraphRAG provides a rich description of real-world entities and relationships, but is more expensive that FastGraphRAG. We estimate graph extraction to constitute roughly 75% of indexing cost. FastGraphRAG is therefore much cheaper, but the tradeoff is that the extracted graph is less directly relevant for use outside of GraphRAG, and the graph tends to be quite a bit noisier. If high fidelity entities and graph exploration are important to your use case, we recommend staying with traditional GraphRAG. If your use case is primarily aimed at summary questions using global search, FastGraphRAG provides high quality summarization at much less LLM cost.

View File

@ -10,7 +10,7 @@ Each of these prompts may be overridden by writing a custom prompt file in plain
### Entity/Relationship Extraction ### Entity/Relationship Extraction
[Prompt Source](http://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/entity_extraction.py) [Prompt Source](http://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py)
#### Tokens #### Tokens
@ -31,7 +31,7 @@ Each of these prompts may be overridden by writing a custom prompt file in plain
### Claim Extraction ### Claim Extraction
[Prompt Source](http://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/claim_extraction.py) [Prompt Source](http://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_claims.py)
#### Tokens #### Tokens