graphrag/DEVELOPING.md
Nathan Evans ac8a7f5eef
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
Housekeeping (#2086)
* Add deprecation warnings for fnllm and multi-search

* Fix dangling token_encoder refs

* Fix local_search notebook

* Fix global search dynamic notebook

* Fix global search notebook

* Fix drift notebook

* Switch example notebooks to use LiteLLM config

* Properly annotate dev deps as a group

* Semver

* Remove --extra dev

* Remove llm_model variable

* Ignore ruff ASYNC240

* Add note about expected broken notebook in docs

* Fix custom vector store notebook

* Push tokenizer throughout
2025-10-07 16:21:24 -07:00

118 lines
5.1 KiB
Markdown

# GraphRAG Development
# Requirements
| Name | Installation | Purpose |
| ------------------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
| Python 3.10 or 3.11 | [Download](https://www.python.org/downloads/) | The library is Python-based. |
| uv | [Instructions](https://docs.astral.sh/uv/) | uv is used for package management and virtualenv management in Python codebases |
# Getting Started
## Install Dependencies
```shell
# install python dependencies
uv sync
```
## Execute the indexing engine
```shell
uv run poe index <...args>
```
## Execute prompt tuning
```shell
uv run poe prompt_tune <...args>
```
## Execute Queries
```shell
uv run poe query <...args>
```
## Repository Structure
An overview of the repository's top-level folder structure is provided below, detailing the overall design and purpose.
We leverage a factory design pattern where possible, enabling a variety of implementations for each core component of graphrag.
```shell
graphrag
├── api # library API definitions
├── cache # cache module supporting several options
│   └─ factory.py # └─ main entrypoint to create a cache
├── callbacks # a collection of commonly used callback functions
├── cli # library CLI
│   └─ main.py # └─ primary CLI entrypoint
├── config # configuration management
├── index # indexing engine
| └─ run/run.py # main entrypoint to build an index
├── logger # logger module supporting several options
│   └─ factory.py # └─ main entrypoint to create a logger
├── model # data model definitions associated with the knowledge graph
├── prompt_tune # prompt tuning module
├── prompts # a collection of all the system prompts used by graphrag
├── query # query engine
├── storage # storage module supporting several options
│   └─ factory.py # └─ main entrypoint to create/load a storage endpoint
├── utils # helper functions used throughout the library
└── vector_stores # vector store module containing a few options
└─ factory.py # └─ main entrypoint to create a vector store
```
Where appropriate, the factories expose a registration method for users to provide their own custom implementations if desired.
## Versioning
We use [semversioner](https://github.com/raulgomis/semversioner) to automate and enforce semantic versioning in the release process. Our CI/CD pipeline checks that all PR's include a json file generated by semversioner. When submitting a PR, please run:
```shell
uv run semversioner add-change -t patch -d "<a small sentence describing changes made>."
```
# Azurite
Some unit and smoke tests use Azurite to emulate Azure resources. This can be started by running:
```sh
./scripts/start-azurite.sh
```
or by simply running `azurite` in the terminal if already installed globally. See the [Azurite documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite) for more information about how to install and use Azurite.
# Lifecycle Scripts
Our Python package utilizes uv to manage dependencies and [poethepoet](https://pypi.org/project/poethepoet/) to manage custom build scripts.
Available scripts are:
- `uv run poe index` - Run the Indexing CLI
- `uv run poe query` - Run the Query CLI
- `uv build` - This invokes `uv build`, which will build a wheel file and other distributable artifacts.
- `uv run poe test` - This will execute all tests.
- `uv run poe test_unit` - This will execute unit tests.
- `uv run poe test_integration` - This will execute integration tests.
- `uv run poe test_smoke` - This will execute smoke tests.
- `uv run poe check` - This will perform a suite of static checks across the package, including:
- formatting
- documentation formatting
- linting
- security patterns
- type-checking
- `uv run poe fix` - This will apply any available auto-fixes to the package. Usually this is just formatting fixes.
- `uv run poe fix_unsafe` - This will apply any available auto-fixes to the package, including those that may be unsafe.
- `uv run poe format` - Explicitly run the formatter across the package.
## Troubleshooting
### "RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config" when running uv sync
Make sure llvm-9 and llvm-9-dev are installed:
`sudo apt-get install llvm-9 llvm-9-dev`
and then in your bashrc, add
`export LLVM_CONFIG=/usr/bin/llvm-config-9`
### "numba/\_pymodule.h:6:10: fatal error: Python.h: No such file or directory" when running uv sync
Make sure you have python3.10-dev installed or more generally `python<version>-dev`
`sudo apt-get install python3.10-dev`