graphrag/tests/unit/indexing/cache/test_file_pipeline_cache.py
Copilot 2030f94eb4
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation (#2006)
* Initial plan

* Refactor VectorStoreFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in VectorStoreFactory refactoring

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove backward compatibility support from VectorStoreFactory and StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix and ruff format, add semversioner file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update storage factory design

* Refactor CacheFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* revert copilot changes

* fix copilot changes

* update comments

* Fix failing pytest compatibility for factory tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update class instantiation issue

* ruff fixes

* fix pytest

* add default value

* ruff formatting changes

* ruff fixes

* revert minor changes

* cleanup cache factory

* Update CacheFactory tests to match consistent factory pattern

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update pytest thresholds

* adjust threshold levels

* Add custom vector store implementation notebook

Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:

- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist

The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove sample notebook for now

* update tests

* fix cache pytests

* add pandas-stub to dev dependencies

* disable warning check for well known key

* skip tests when running on ubuntu

* add documentation for custom vector store implementations

* ignore ruff findings in notebooks

* fix merge breakages

* speedup CLI import statements

* remove unnecessary import statements in init file

* Add str type option on storage/cache type

* Fix store name

* Add LoggerFactory

* Fix up logging setup across CLI/API

* Add LoggerFactory test

* Fix err message

* Semver

* Remove enums from factory methods

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00

75 lines
2.3 KiB
Python

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
import asyncio
import os
import unittest
from graphrag.cache.json_pipeline_cache import JsonPipelineCache
from graphrag.storage.file_pipeline_storage import (
FilePipelineStorage,
)
TEMP_DIR = "./.tmp"
def create_cache():
storage = FilePipelineStorage(base_dir=os.path.join(os.getcwd(), ".tmp"))
return JsonPipelineCache(storage)
class TestFilePipelineCache(unittest.IsolatedAsyncioTestCase):
def setUp(self):
self.cache = create_cache()
def tearDown(self):
asyncio.run(self.cache.clear())
async def test_cache_clear(self):
# Create a cache directory
if not os.path.exists(TEMP_DIR):
os.mkdir(TEMP_DIR)
with open(f"{TEMP_DIR}/test1", "w") as f:
f.write("This is test1 file.")
with open(f"{TEMP_DIR}/test2", "w") as f:
f.write("This is test2 file.")
# this invokes cache.clear()
await self.cache.clear()
# Check if the cache directory is empty
files = os.listdir(TEMP_DIR)
assert len(files) == 0
async def test_child_cache(self):
await self.cache.set("test1", "test1")
assert os.path.exists(f"{TEMP_DIR}/test1")
child = self.cache.child("test")
assert os.path.exists(f"{TEMP_DIR}/test")
await child.set("test2", "test2")
assert os.path.exists(f"{TEMP_DIR}/test/test2")
await self.cache.set("test1", "test1")
await self.cache.delete("test1")
assert not os.path.exists(f"{TEMP_DIR}/test1")
async def test_cache_has(self):
test1 = "this is a test file"
await self.cache.set("test1", test1)
assert await self.cache.has("test1")
assert not await self.cache.has("NON_EXISTENT")
assert await self.cache.get("NON_EXISTENT") is None
async def test_get_set(self):
test1 = "this is a test file"
test2 = "\\n test"
test3 = "\\\\\\"
await self.cache.set("test1", test1)
await self.cache.set("test2", test2)
await self.cache.set("test3", test3)
assert await self.cache.get("test1") == test1
assert await self.cache.get("test2") == test2
assert await self.cache.get("test3") == test3