mirror of https://github.com/microsoft/graphrag.git synced 2026-01-14 09:07:20 +08:00

History

Nathan Evans bffa400c89 Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python update (3.13) (#2149 ) * Update to python 3.14 as default, with range down to 3.10 * Fix enum value in query cli * Update pyarrow * Update py version for storage package * Remove 3.10 * add fastuuid * Update Python support to 3.11-3.14 with stricter dependency constraints - Set minimum Python version to 3.11 (removed 3.10 support) - Added support for Python 3.14 - Updated CI workflows: single-version jobs use 3.14, matrix jobs use 3.11 and 3.14 - Fixed license format to use SPDX-compatible format for Python 3.14 - Updated pyarrow to >=22.0.0 for Python 3.14 wheel support - Added explicit fastuuid~=0.14 and blis~=1.3 for Python 3.14 compatibility - Replaced all loose version constraints (>=) with compatible release (~=) for better lock file control - Applied stricter versioning to all packages: graphrag, graphrag-common, graphrag-storage, unified-search-app * update uv lock * Pin blis to ~=1.3.3 to ensure Python 3.14 wheel availability * Update uv lock * Update numpy to >=2.0.0 for Python 3.14 Windows compatibility Numpy 1.25.x has access violation issues on Python 3.14 Windows. Numpy 2.x has proper Python 3.14 support including Windows wheels. * update uv lock * Update pandas to >=2.3.0 for numpy 2.x compatibility Pandas 2.2.x was compiled against numpy 1.x and causes ABI incompatibility errors with numpy 2.x. Pandas 2.3.0+ supports numpy 2.x properly. * update uv.lock * Add scipy>=1.15.0 for numpy 2.x compatibility Scipy versions < 1.15.0 have C extensions built against numpy 1.x and are incompatible with numpy 2.x, causing dtype size errors. * update uv lock * Update Python support to 3.11-3.13 with compatible dependencies - Set Python version range to 3.11-3.13 (removed 3.14 support) - Updated CI workflows: single-version jobs use 3.13, matrix jobs use 3.11 and 3.13 - Dependencies optimized for Python 3.13 compatibility: - pyarrow~=22.0 (has Python 3.13 wheels) - numpy~=1.26 - pandas~=2.2 - blis~=1.0 - fastuuid~=0.13 - Applied stricter version constraints using ~= operator throughout - Updated uv.lock with resolved dependencies * Update numpy to 2.1+ and pandas to 2.3+ for Python 3.13 Windows compatibility Numpy 1.26.x causes access violations on Python 3.13 Windows. Numpy 2.1+ has proper Python 3.13 support with Windows wheels. Pandas 2.3+ is required for numpy 2.x compatibility. * update vsts.yml python version		2025-12-15 15:39:38 -08:00
..
app	Format	2025-11-17 13:59:58 -08:00
images	Unified search added to graphrag (#1862 )	2025-04-07 11:59:02 -06:00
.vsts-ci.yml	Update .vsts-ci.yml (#1874 )	2025-04-10 10:31:03 -06:00
Dockerfile	Switch from Poetry to uv for package management (#2008 )	2025-08-13 18:57:25 -06:00
pyproject.toml	Python update (3.13) (#2149 )	2025-12-15 15:39:38 -08:00
README.md	Nov 2025 housekeeping (#2120 )	2025-11-06 10:03:22 -08:00
uv.lock	Remove graph embedding and UMAP (#2048 )	2025-09-09 15:35:43 -07:00

README.md

Unified Search

Unified demo for GraphRAG search comparisons.

⚠️ This app is maintained for demo/experimental purposes and is not supported. Issue filings on the GraphRAG repo may not be addressed.

Requirements:

Python 3.11
UV

This sample app is not published to pypi, so you'll need to clone the GraphRAG repo and run from this folder.

We recommend always using a virtual environment:

uv venv --python 3.11
source .venv/bin/activate

Run index

Use GraphRAG to index your dataset before running Unified Search. We recommend starting with the Getting Started guide.

Datasets

Unified Search supports multiple GraphRAG indexes by using a directory listing file. Create a listing.json file in the root folder where all your datasets are stored (locally or in blob storage), with the following format (one entry per dataset):

[{
    "key": "<key_to_identify_dataset_1>",
    "path": "<path_to_dataset_1>",
    "name": "<name_to_identify_dataset_1>",
    "description": "<description_for_dataset_1>",
    "community_level": "<integer for community level you want to filter>"
},{
    "key": "<key_to_identify_dataset_2>",
    "path": "<path_to_dataset_2>",
    "name": "<name_to_identify_dataset_2>",
    "description": "<description_for_dataset_2>",
    "community_level": "<integer for community level you want to filter>"
}]

For example, if you have a folder of GraphRAG indexes called "projects" and inside that you ran the Getting Started instructions, your listing.json in the projects folder could look like:

[{
    "key": "christmas-demo",
    "path": "christmas",
    "name": "A Christmas Carol",
    "description": "Getting Started index of the novel A Christmas Carol",
    "community_level": 2
}]

Data Source Configuration

The expected format of the projects folder will be the following:

projects_folder
- listing.json
- dataset_1
  - settings.yaml
  - .env (optional if you declare your environment variables elsewhere)
  - output
  - prompts
- dataset_2
  - settings.yaml
  - .env (optional if you declare your environment variables elsewhere)
  - output
  - prompts
- ...

Note: Any other folder inside each dataset folder will be ignored but will not affect the app. Also, only the datasets declared inside listing.json will be used for Unified Search.

Storing your datasets

You can host Unified Search datasets locally or in a blob.

1. Local data folder

Create a local folder with all your data and config as described above
Tell the app where your folder is using an absolute path with the following environment variable:

DATA_ROOT = <data_folder_absolute_path>

2. Azure Blob Storage

If you want to use Azure Blob Storage, create a blob storage account with a "data" container and upload all your data and config as described above
Run az login and select an account that has read permissions on that storage
You need to tell the app what blob account to use using the following environment variable:

BLOB_ACCOUNT_NAME = <blob_storage_name>

(optional) In your blob account you need to create a container where your projects live. We default this to data as mentioned in step one, but if you want to use something else you can set:

BLOB_CONTAINER_NAME = <blob_container_with_projects>

Run the app

Install all the dependencies: uv sync

Run the project using streamlit: uv run poe start

How to use it

Configuration panel (left panel)

When you run the app you will see two main panels at the beginning. The left panel provides several configuration options for the app and this panel can be closed:

Datasets: Here all the datasets you defined inside the listing.json file are shown in order inside the dropdown.
Number of suggested questions: this option let the user to choose how many suggested question can be generated.
Search options: This section allows to choose which searches to use in the app. At least one search should be enabled to use the app.

Searches panel (right panel)

In the right panel you have several functionalities.

At the top you can see general information related to the chosen dataset (name and description).
Below the dataset information there is a button labeled "Suggest some questions" which analyzes the dataset using global search and generates the most important questions (the number of questions generated is the amount set in the configuration panel). If you want to select a question generated you have to click the checkbox at the left side of the question to select it.
A textbox that it is labeled as "Ask a question to compare the results" where you can type the question that you want to send.
Two tabs called Search and Community Explorer:
1. Search: Here all the searches results are displayed with their citations.
2. Community Explorer: This tab is divided in two sections: Community Reports List, and Selected Report.