graphrag/docs/index/overview.md
Nathan Evans ae1f5e1811
Some checks failed
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Nov 2025 housekeeping (#2120)
* Remove gensim sideload

* Split CI build/type checks from unit tests

* Thorough review of docs to align with v3

* Format

* Fix version

* Fix type
2025-11-06 10:03:22 -08:00

1.7 KiB

GraphRAG Indexing 🤖

The GraphRAG indexing package is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using LLMs.

Indexing Pipelines are configurable. They are composed of workflows, standard and custom steps, prompt templates, and input/output adapters. Our standard pipeline is designed to:

  • extract entities, relationships and claims from raw text
  • perform community detection in entities
  • generate community summaries and reports at multiple levels of granularity
  • embed text into a vector space

The outputs of the pipeline are stored as Parquet tables by default, and embeddings are written to your configured vector store.

Getting Started

Requirements

See the requirements section in Get Started for details on setting up a development environment.

To configure GraphRAG, see the configuration documentation. After you have a config file you can run the pipeline using the CLI or the Python API.

Usage

CLI

uv run poe index --root <data_root> # default config mode

Python API

Please see the indexing API python file for the recommended method to call directly from Python code.

Further Reading