mirror of
https://github.com/microsoft/graphrag.git
synced 2026-01-14 00:57:23 +08:00
Some checks are pending
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan * Switch from Poetry to uv for package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Clean up build artifacts and update gitignore Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove build artifacts * remove hardcoded version string * fix calls to pip in cicd * Update gh-pages.yml workflow to use uv instead of Poetry Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * update cicd workflow with latest uv action * fix command to retrieve package version * update development instructions * remove Poetry references * Replace deprecated azuright action with npm-based Azurite installation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * skip api version check for azurite * add semversioner file * update more changes from switching to UV * Migrate unified-search-app from Poetry to uv package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * minor typo update * minor Dockerfile update * update cicd thresholds * update pytest thresholds * ruff fixes * ruff fixes * remove legacy npm settings that no longer apply * Update Unified Search App Readme --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
127 lines
5.5 KiB
Markdown
127 lines
5.5 KiB
Markdown
# GraphRAG Development
|
|
|
|
# Requirements
|
|
|
|
| Name | Installation | Purpose |
|
|
| ------------------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
|
|
| Python 3.10 or 3.11 | [Download](https://www.python.org/downloads/) | The library is Python-based. |
|
|
| uv | [Instructions](https://docs.astral.sh/uv/) | uv is used for package management and virtualenv management in Python codebases |
|
|
|
|
# Getting Started
|
|
|
|
## Install Dependencies
|
|
```shell
|
|
# (optional) create virtual environment
|
|
uv venv --python 3.10
|
|
source .venv/bin/activate
|
|
|
|
# install python dependencies
|
|
uv sync --extra dev
|
|
```
|
|
|
|
## Execute the indexing engine
|
|
```shell
|
|
uv run poe index <...args>
|
|
```
|
|
|
|
## Execute prompt tuning
|
|
```shell
|
|
uv run poe prompt_tune <...args>
|
|
```
|
|
|
|
## Execute Queries
|
|
```shell
|
|
uv run poe query <...args>
|
|
```
|
|
|
|
## Repository Structure
|
|
An overview of the repository's top-level folder structure is provided below, detailing the overall design and purpose.
|
|
We leverage a factory design pattern where possible, enabling a variety of implementations for each core component of graphrag.
|
|
|
|
```shell
|
|
graphrag
|
|
├── api # library API definitions
|
|
├── cache # cache module supporting several options
|
|
│ └─ factory.py # └─ main entrypoint to create a cache
|
|
├── callbacks # a collection of commonly used callback functions
|
|
├── cli # library CLI
|
|
│ └─ main.py # └─ primary CLI entrypoint
|
|
├── config # configuration management
|
|
├── index # indexing engine
|
|
| └─ run/run.py # main entrypoint to build an index
|
|
├── logger # logger module supporting several options
|
|
│ └─ factory.py # └─ main entrypoint to create a logger
|
|
├── model # data model definitions associated with the knowledge graph
|
|
├── prompt_tune # prompt tuning module
|
|
├── prompts # a collection of all the system prompts used by graphrag
|
|
├── query # query engine
|
|
├── storage # storage module supporting several options
|
|
│ └─ factory.py # └─ main entrypoint to create/load a storage endpoint
|
|
├── utils # helper functions used throughout the library
|
|
└── vector_stores # vector store module containing a few options
|
|
└─ factory.py # └─ main entrypoint to create a vector store
|
|
```
|
|
Where appropriate, the factories expose a registration method for users to provide their own custom implementations if desired.
|
|
|
|
## Versioning
|
|
|
|
We use [semversioner](https://github.com/raulgomis/semversioner) to automate and enforce semantic versioning in the release process. Our CI/CD pipeline checks that all PR's include a json file generated by semversioner. When submitting a PR, please run:
|
|
```shell
|
|
uv run semversioner add-change -t patch -d "<a small sentence describing changes made>."
|
|
```
|
|
|
|
# Azurite
|
|
|
|
Some unit and smoke tests use Azurite to emulate Azure resources. This can be started by running:
|
|
|
|
```sh
|
|
./scripts/start-azurite.sh
|
|
```
|
|
|
|
or by simply running `azurite` in the terminal if already installed globally. See the [Azurite documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite) for more information about how to install and use Azurite.
|
|
|
|
# Lifecycle Scripts
|
|
|
|
Our Python package utilizes uv to manage dependencies and [poethepoet](https://pypi.org/project/poethepoet/) to manage custom build scripts.
|
|
|
|
Available scripts are:
|
|
- `uv run poe index` - Run the Indexing CLI
|
|
- `uv run poe query` - Run the Query CLI
|
|
- `uv build` - This invokes `uv build`, which will build a wheel file and other distributable artifacts.
|
|
- `uv run poe test` - This will execute all tests.
|
|
- `uv run poe test_unit` - This will execute unit tests.
|
|
- `uv run poe test_integration` - This will execute integration tests.
|
|
- `uv run poe test_smoke` - This will execute smoke tests.
|
|
- `uv run poe check` - This will perform a suite of static checks across the package, including:
|
|
- formatting
|
|
- documentation formatting
|
|
- linting
|
|
- security patterns
|
|
- type-checking
|
|
- `uv run poe fix` - This will apply any available auto-fixes to the package. Usually this is just formatting fixes.
|
|
- `uv run poe fix_unsafe` - This will apply any available auto-fixes to the package, including those that may be unsafe.
|
|
- `uv run poe format` - Explicitly run the formatter across the package.
|
|
|
|
## Troubleshooting
|
|
|
|
### "RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config" when running uv sync
|
|
|
|
Make sure llvm-9 and llvm-9-dev are installed:
|
|
|
|
`sudo apt-get install llvm-9 llvm-9-dev`
|
|
|
|
and then in your bashrc, add
|
|
|
|
`export LLVM_CONFIG=/usr/bin/llvm-config-9`
|
|
|
|
### "numba/\_pymodule.h:6:10: fatal error: Python.h: No such file or directory" when running uv sync
|
|
|
|
Make sure you have python3.10-dev installed or more generally `python<version>-dev`
|
|
|
|
`sudo apt-get install python3.10-dev`
|
|
|
|
### LLM call constantly exceeds TPM, RPM or time limits
|
|
|
|
`GRAPHRAG_LLM_THREAD_COUNT` and `GRAPHRAG_EMBEDDING_THREAD_COUNT` are both set to 50 by default. You can modify these values
|
|
to reduce concurrency. Please refer to the [Configuration Documents](https://microsoft.github.io/graphrag/config/overview/)
|