graphrag/tests/integration/vector_stores/test_factory.py
Copilot 2030f94eb4
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation (#2006)
* Initial plan

* Refactor VectorStoreFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in VectorStoreFactory refactoring

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove backward compatibility support from VectorStoreFactory and StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix and ruff format, add semversioner file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update storage factory design

* Refactor CacheFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* revert copilot changes

* fix copilot changes

* update comments

* Fix failing pytest compatibility for factory tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update class instantiation issue

* ruff fixes

* fix pytest

* add default value

* ruff formatting changes

* ruff fixes

* revert minor changes

* cleanup cache factory

* Update CacheFactory tests to match consistent factory pattern

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update pytest thresholds

* adjust threshold levels

* Add custom vector store implementation notebook

Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:

- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist

The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove sample notebook for now

* update tests

* fix cache pytests

* add pandas-stub to dev dependencies

* disable warning check for well known key

* skip tests when running on ubuntu

* add documentation for custom vector store implementations

* ignore ruff findings in notebooks

* fix merge breakages

* speedup CLI import statements

* remove unnecessary import statements in init file

* Add str type option on storage/cache type

* Fix store name

* Add LoggerFactory

* Fix up logging setup across CLI/API

* Add LoggerFactory test

* Fix err message

* Semver

* Remove enums from factory methods

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00

145 lines
5.2 KiB
Python

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""VectorStoreFactory Tests.
These tests will test the VectorStoreFactory class and the creation of each vector store type that is natively supported.
"""
import pytest
from graphrag.config.enums import VectorStoreType
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore
from graphrag.vector_stores.base import BaseVectorStore
from graphrag.vector_stores.cosmosdb import CosmosDBVectorStore
from graphrag.vector_stores.factory import VectorStoreFactory
from graphrag.vector_stores.lancedb import LanceDBVectorStore
def test_create_lancedb_vector_store():
kwargs = {
"collection_name": "test_collection",
"db_uri": "/tmp/lancedb",
}
vector_store = VectorStoreFactory.create_vector_store(
VectorStoreType.LanceDB.value, kwargs
)
assert isinstance(vector_store, LanceDBVectorStore)
assert vector_store.collection_name == "test_collection"
@pytest.mark.skip(reason="Azure AI Search requires credentials and setup")
def test_create_azure_ai_search_vector_store():
kwargs = {
"collection_name": "test_collection",
"url": "https://test.search.windows.net",
"api_key": "test_key",
}
vector_store = VectorStoreFactory.create_vector_store(
VectorStoreType.AzureAISearch.value, kwargs
)
assert isinstance(vector_store, AzureAISearchVectorStore)
@pytest.mark.skip(reason="CosmosDB requires credentials and setup")
def test_create_cosmosdb_vector_store():
kwargs = {
"collection_name": "test_collection",
"connection_string": "AccountEndpoint=https://test.documents.azure.com:443/;AccountKey=test_key==",
"database_name": "test_db",
}
vector_store = VectorStoreFactory.create_vector_store(
VectorStoreType.CosmosDB.value, kwargs
)
assert isinstance(vector_store, CosmosDBVectorStore)
def test_register_and_create_custom_vector_store():
"""Test registering and creating a custom vector store type."""
from unittest.mock import MagicMock
# Create a mock that satisfies the BaseVectorStore interface
custom_vector_store_class = MagicMock(spec=BaseVectorStore)
# Make the mock return a mock instance when instantiated
instance = MagicMock()
instance.initialized = True
custom_vector_store_class.return_value = instance
VectorStoreFactory.register(
"custom", lambda **kwargs: custom_vector_store_class(**kwargs)
)
vector_store = VectorStoreFactory.create_vector_store("custom", {})
assert custom_vector_store_class.called
assert vector_store is instance
# Access the attribute we set on our mock
assert vector_store.initialized is True # type: ignore # Attribute only exists on our mock
# Check if it's in the list of registered vector store types
assert "custom" in VectorStoreFactory.get_vector_store_types()
assert VectorStoreFactory.is_supported_type("custom")
def test_get_vector_store_types():
vector_store_types = VectorStoreFactory.get_vector_store_types()
# Check that built-in types are registered
assert VectorStoreType.LanceDB.value in vector_store_types
assert VectorStoreType.AzureAISearch.value in vector_store_types
assert VectorStoreType.CosmosDB.value in vector_store_types
def test_create_unknown_vector_store():
with pytest.raises(ValueError, match="Unknown vector store type: unknown"):
VectorStoreFactory.create_vector_store("unknown", {})
def test_is_supported_type():
# Test built-in types
assert VectorStoreFactory.is_supported_type(VectorStoreType.LanceDB.value)
assert VectorStoreFactory.is_supported_type(VectorStoreType.AzureAISearch.value)
assert VectorStoreFactory.is_supported_type(VectorStoreType.CosmosDB.value)
# Test unknown type
assert not VectorStoreFactory.is_supported_type("unknown")
def test_register_class_directly_works():
"""Test that registering a class directly works (VectorStoreFactory allows this)."""
from graphrag.vector_stores.base import BaseVectorStore
class CustomVectorStore(BaseVectorStore):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def connect(self, **kwargs):
pass
def load_documents(self, documents, overwrite=True):
pass
def similarity_search_by_vector(self, query_embedding, k=10, **kwargs):
return []
def similarity_search_by_text(self, text, text_embedder, k=10, **kwargs):
return []
def filter_by_id(self, include_ids):
return {}
def search_by_id(self, id):
from graphrag.vector_stores.base import VectorStoreDocument
return VectorStoreDocument(id=id, text="test", vector=None)
# VectorStoreFactory allows registering classes directly (no TypeError)
VectorStoreFactory.register("custom_class", CustomVectorStore)
# Verify it was registered
assert "custom_class" in VectorStoreFactory.get_vector_store_types()
assert VectorStoreFactory.is_supported_type("custom_class")
# Test creating an instance
vector_store = VectorStoreFactory.create_vector_store(
"custom_class", {"collection_name": "test"}
)
assert isinstance(vector_store, CustomVectorStore)