mirror of https://github.com/microsoft/graphrag.git synced 2026-01-14 00:57:23 +08:00

A modular graph-based Retrieval-Augmented Generation (RAG) system

Go to file

Nathan Evans 710fdad6f0 Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Input factory (#2168 ) * Update input factory to match other factories * Move input config alongside input readers * Move file pattern logic into InputReader * Set encoding default * Clean up optional column configs * Combine structured data extraction * Remove pandas from input loading * Throw if empty documents * Add json lines (jsonl) input support * Store raw data * Fix merge imports * Move metadata handling entirely to chunking * Nicer automatic title * Typo * Add get_property utility for nested dictionary access with dot notation * Update structured_file_reader to use get_property utility * Extract input module into new graphrag-input monorepo package - Create new graphrag-input package with input loading utilities - Move InputConfig, InputFileType, InputReader, TextDocument, and file readers (CSV, JSON, JSONL, Text) - Add get_property utility for nested dictionary access with dot notation - Include hashing utility for document ID generation - Update all imports throughout codebase to use graphrag_input - Add package to workspace configuration and release tasks - Remove old graphrag.index.input module * Rename ChunkResult to TextChunk and add transformer support - Rename chunk_result.py to text_chunk.py with ChunkResult -> TextChunk - Add 'original' field to TextChunk to track pre-transform text - Add optional transform callback to chunker.chunk() method - Add add_metadata transformer for prepending metadata to chunks - Update create_chunk_results to apply transforms and populate original - Update sentence_chunker and token_chunker with transform support - Refactor create_base_text_units to use new transformer pattern - Rename pluck_metadata to get/collect methods on TextDocument * Back-compat comment * Align input config type name with other factory configs * Add MarkItDown support * Remove pattern default from MarkItDown reader * Remove plugins flag (implicit disabled) * Format * Update verb tests * Separate storage from input config * Add empty objects for NaN raw_data * Fix smoke tests * Fix BOM in csv smoke * Format		2026-01-12 12:47:57 -08:00
.github	Python update (3.13) (#2149 )	2025-12-15 15:39:38 -08:00
.semversioner	Merge branch 'main' into v3/main	2025-10-10 17:03:52 -07:00
.vscode	Restructure project as monorepo. (#2111 )	2025-11-04 09:51:56 -08:00
docs	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
packages	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
scripts	Restructure project as monorepo. (#2111 )	2025-11-04 09:51:56 -08:00
tests	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
unified-search-app	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
.gitattributes	move mkdocs-typer to devdeps (#1331 )	2024-10-30 14:49:30 -07:00
.gitignore	Restructure project as monorepo. (#2111 )	2025-11-04 09:51:56 -08:00
.vsts-ci.yml	Python update (3.13) (#2149 )	2025-12-15 15:39:38 -08:00
breaking-changes.md	V3 docs and cleanup (#2100 )	2025-10-15 13:47:19 -07:00
CHANGELOG.md	Release v2.7.0 (#2087 )	2025-10-08 21:33:34 -07:00
CODE_OF_CONDUCT.md	Initial Release	2024-07-01 15:25:30 -06:00
CODEOWNERS	Stabilize smoke tests for query community context building (#908 )	2024-08-12 13:17:40 -06:00
CONTRIBUTING.md	Switch from Poetry to uv for package management (#2008 )	2025-08-13 18:57:25 -06:00
cspell.config.yaml	Fix/notebooks (#1614 )	2025-01-13 17:41:39 -06:00
DEVELOPING.md	Housekeeping (#2086 )	2025-10-07 16:21:24 -07:00
dictionary.txt	Tokenizer (#2051 )	2025-09-22 13:55:14 -06:00
LICENSE	Initial Release	2024-07-01 15:25:30 -06:00
mkdocs.yaml	Docs/2.6.0 (#2070 )	2025-09-23 14:48:28 -07:00
pyproject.toml	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
RAI_TRANSPARENCY.md	Initial Release	2024-07-01 15:25:30 -06:00
README.md	Update docs for 2.0+ (#1984 )	2025-06-23 13:49:47 -07:00
SECURITY.md	Initial Release	2024-07-01 15:25:30 -06:00
SUPPORT.md	Initial Release	2024-07-01 15:25:30 -06:00
uv.lock	Input factory (#2168 )	2026-01-12 12:47:57 -08:00

README.md

GraphRAG

👉 Microsoft Research Blog Post
👉 Read the docs
👉 GraphRAG Arxiv

Overview

The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.

To learn more about GraphRAG and how it can be used to enhance your LLM's ability to reason about your private data, please visit the Microsoft Research Blog Post.

Quickstart

To get started with the GraphRAG system we recommend trying the command line quickstart.

Repository Guidance

This repository presents a methodology for using knowledge graph memory structures to enhance LLM outputs. Please note that the provided code serves as a demonstration and is not an officially supported Microsoft offering.

⚠️ Warning: GraphRAG indexing can be an expensive operation, please read all of the documentation to understand the process and costs involved, and start small.

Diving Deeper

To learn about our contribution guidelines, see CONTRIBUTING.md
To start developing GraphRAG, see DEVELOPING.md
Join the conversation and provide feedback in the GitHub Discussions tab!

Prompt Tuning

Using GraphRAG with your data out of the box may not yield the best possible results. We strongly recommend to fine-tune your prompts following the Prompt Tuning Guide in our documentation.

Versioning

Please see the breaking changes document for notes on our approach to versioning the project.

Always run graphrag init --root [path] --force between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.

Responsible AI FAQ

See RAI_TRANSPARENCY.md

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy

Microsoft Privacy Statement

README.md Unescape Escape