graphrag/packages/graphrag-storage
Nathan Evans bffa400c89
Some checks are pending
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python update (3.13) (#2149)
* Update to python 3.14 as default, with range down to 3.10

* Fix enum value in query cli

* Update pyarrow

* Update py version for storage package

* Remove 3.10

* add fastuuid

* Update Python support to 3.11-3.14 with stricter dependency constraints

- Set minimum Python version to 3.11 (removed 3.10 support)
- Added support for Python 3.14
- Updated CI workflows: single-version jobs use 3.14, matrix jobs use 3.11 and 3.14
- Fixed license format to use SPDX-compatible format for Python 3.14
- Updated pyarrow to >=22.0.0 for Python 3.14 wheel support
- Added explicit fastuuid~=0.14 and blis~=1.3 for Python 3.14 compatibility
- Replaced all loose version constraints (>=) with compatible release (~=) for better lock file control
- Applied stricter versioning to all packages: graphrag, graphrag-common, graphrag-storage, unified-search-app

* update uv lock

* Pin blis to ~=1.3.3 to ensure Python 3.14 wheel availability

* Update uv lock

* Update numpy to >=2.0.0 for Python 3.14 Windows compatibility

Numpy 1.25.x has access violation issues on Python 3.14 Windows.
Numpy 2.x has proper Python 3.14 support including Windows wheels.

* update uv lock

* Update pandas to >=2.3.0 for numpy 2.x compatibility

Pandas 2.2.x was compiled against numpy 1.x and causes ABI
incompatibility errors with numpy 2.x. Pandas 2.3.0+ supports
numpy 2.x properly.

* update uv.lock

* Add scipy>=1.15.0 for numpy 2.x compatibility

Scipy versions < 1.15.0 have C extensions built against numpy 1.x
and are incompatible with numpy 2.x, causing dtype size errors.

* update uv lock

* Update Python support to 3.11-3.13 with compatible dependencies

- Set Python version range to 3.11-3.13 (removed 3.14 support)
- Updated CI workflows: single-version jobs use 3.13, matrix jobs use 3.11 and 3.13
- Dependencies optimized for Python 3.13 compatibility:
  - pyarrow~=22.0 (has Python 3.13 wheels)
  - numpy~=1.26
  - pandas~=2.2
  - blis~=1.0
  - fastuuid~=0.13
- Applied stricter version constraints using ~= operator throughout
- Updated uv.lock with resolved dependencies

* Update numpy to 2.1+ and pandas to 2.3+ for Python 3.13 Windows compatibility

Numpy 1.26.x causes access violations on Python 3.13 Windows.
Numpy 2.1+ has proper Python 3.13 support with Windows wheels.
Pandas 2.3+ is required for numpy 2.x compatibility.

* update vsts.yml python version
2025-12-15 15:39:38 -08:00
..
graphrag_storage Add graphrag-storage. (#2127) 2025-12-15 09:32:19 -08:00
pyproject.toml Python update (3.13) (#2149) 2025-12-15 15:39:38 -08:00
README.md Add graphrag-storage. (#2127) 2025-12-15 09:32:19 -08:00

GraphRAG Storage

Basic

import asyncio
from graphrag_storage import StorageConfig, create_storage, StorageType

async def run():
    storage = create_storage(
        StorageConfig(
            type=StorageType.File
            base_dir="output"
        )
    )

    await storage.set("my_key", "value")
    print(await storage.get("my_key"))

if __name__ == "__main__":
    asyncio.run(run())

Custom Storage

import asyncio
from typing import Any
from graphrag_storage import Storage, StorageConfig, create_storage, register_storage

class MyStorage(Storage):
    def __init__(self, some_setting: str, optional_setting: str = "default setting", **kwargs: Any):
        # Validate settings and initialize
        ...

    #Implement rest of interface
    ...

register_storage("MyStorage", MyStorage)

async def run():
    storage = create_storage(
        StorageConfig(
            type="MyStorage"
            some_setting="My Setting"
        )
    )
    # Or use the factory directly to instantiate with a dict instead of using
    # StorageConfig + create_factory
    # from graphrag_storage.storage_factory import storage_factory
    # storage = storage_factory.create(strategy="MyStorage", init_args={"some_setting": "My Setting"})

    await storage.set("my_key", "value")
    print(await storage.get("my_key"))

if __name__ == "__main__":
    asyncio.run(run())

Details

By default, the create_storage comes with the following storage providers registered that correspond to the entries in the StorageType enum.

  • FileStorage
  • AzureBlobStorage
  • AzureCosmosStorage
  • MemoryStorage

The preregistration happens dynamically, e.g., FileStorage is only imported and registered if you request a FileStorage with create_storage(StorageType.File, ...). There is no need to manually import and register builtin storage providers when using create_storage.

If you want a clean factory with no preregistered storage providers then directly import storage_factory and bypass using create_storage. The downside is that storage_factory.create uses a dict for init args instead of the strongly typed StorageConfig used with create_storage.

from graphrag_storage.storage_factory import storage_factory
from graphrag_storage.file_storage import FileStorage

# storage_factory has no preregistered providers so you must register any
# providers you plan on using.
# May also register a custom implementation, see above for example.
storage_factory.register("my_storage_key", FileStorage)

storage = storage_factory.create(strategy="my_storage_key", init_args={"base_dir": "...", "other_settings": "..."})

...