graphrag/DEVELOPING.md
Copilot 7c28c70d5c
Some checks are pending
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
Switch from Poetry to uv for package management (#2008)
* Initial plan

* Switch from Poetry to uv for package management

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Clean up build artifacts and update gitignore

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove build artifacts

* remove hardcoded version string

* fix calls to pip in cicd

* Update gh-pages.yml workflow to use uv instead of Poetry

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update cicd workflow with latest uv action

* fix command to retrieve package version

* update development instructions

* remove Poetry references

* Replace deprecated azuright action with npm-based Azurite installation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* skip api version check for azurite

* add semversioner file

* update more changes from switching to UV

* Migrate unified-search-app from Poetry to uv package management

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* minor typo update

* minor Dockerfile update

* update cicd thresholds

* update pytest thresholds

* ruff fixes

* ruff fixes

* remove legacy npm settings that no longer apply

* Update Unified Search App Readme

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-08-13 18:57:25 -06:00

5.5 KiB

GraphRAG Development

Requirements

Name Installation Purpose
Python 3.10 or 3.11 Download The library is Python-based.
uv Instructions uv is used for package management and virtualenv management in Python codebases

Getting Started

Install Dependencies

# (optional) create virtual environment
uv venv --python 3.10
source .venv/bin/activate

# install python dependencies
uv sync --extra dev

Execute the indexing engine

uv run poe index <...args>

Execute prompt tuning

uv run poe prompt_tune <...args>

Execute Queries

uv run poe query <...args>

Repository Structure

An overview of the repository's top-level folder structure is provided below, detailing the overall design and purpose. We leverage a factory design pattern where possible, enabling a variety of implementations for each core component of graphrag.

graphrag
├── api             # library API definitions
├── cache           # cache module supporting several options
│    └─ factory.py  #  └─ main entrypoint to create a cache
├── callbacks       # a collection of commonly used callback functions
├── cli             # library CLI
│    └─ main.py     #  └─ primary CLI entrypoint
├── config          # configuration management
├── index           # indexing engine
|    └─ run/run.py  #  main entrypoint to build an index
├── logger          # logger module supporting several options
│    └─ factory.py  #  └─ main entrypoint to create a logger
├── model           # data model definitions associated with the knowledge graph
├── prompt_tune     # prompt tuning module 
├── prompts         # a collection of all the system prompts used by graphrag
├── query           # query engine
├── storage         # storage module supporting several options
│    └─ factory.py  #  └─ main entrypoint to create/load a storage endpoint
├── utils           # helper functions used throughout the library
└── vector_stores   # vector store module containing a few options
     └─ factory.py  #  └─ main entrypoint to create a vector store

Where appropriate, the factories expose a registration method for users to provide their own custom implementations if desired.

Versioning

We use semversioner to automate and enforce semantic versioning in the release process. Our CI/CD pipeline checks that all PR's include a json file generated by semversioner. When submitting a PR, please run:

uv run semversioner add-change -t patch -d "<a small sentence describing changes made>."

Azurite

Some unit and smoke tests use Azurite to emulate Azure resources. This can be started by running:

./scripts/start-azurite.sh

or by simply running azurite in the terminal if already installed globally. See the Azurite documentation for more information about how to install and use Azurite.

Lifecycle Scripts

Our Python package utilizes uv to manage dependencies and poethepoet to manage custom build scripts.

Available scripts are:

  • uv run poe index - Run the Indexing CLI
  • uv run poe query - Run the Query CLI
  • uv build - This invokes uv build, which will build a wheel file and other distributable artifacts.
  • uv run poe test - This will execute all tests.
  • uv run poe test_unit - This will execute unit tests.
  • uv run poe test_integration - This will execute integration tests.
  • uv run poe test_smoke - This will execute smoke tests.
  • uv run poe check - This will perform a suite of static checks across the package, including:
    • formatting
    • documentation formatting
    • linting
    • security patterns
    • type-checking
  • uv run poe fix - This will apply any available auto-fixes to the package. Usually this is just formatting fixes.
  • uv run poe fix_unsafe - This will apply any available auto-fixes to the package, including those that may be unsafe.
  • uv run poe format - Explicitly run the formatter across the package.

Troubleshooting

"RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config" when running uv sync

Make sure llvm-9 and llvm-9-dev are installed:

sudo apt-get install llvm-9 llvm-9-dev

and then in your bashrc, add

export LLVM_CONFIG=/usr/bin/llvm-config-9

"numba/_pymodule.h:6:10: fatal error: Python.h: No such file or directory" when running uv sync

Make sure you have python3.10-dev installed or more generally python<version>-dev

sudo apt-get install python3.10-dev

LLM call constantly exceeds TPM, RPM or time limits

GRAPHRAG_LLM_THREAD_COUNT and GRAPHRAG_EMBEDDING_THREAD_COUNT are both set to 50 by default. You can modify these values to reduce concurrency. Please refer to the Configuration Documents