Commit Graph

6 Commits

Author SHA1 Message Date
Nathan Evans
978e79875e
Remove file filtering (#2050)
* Remove document filtering

* Semver

* Fix integ tests

* Fix file find tuple

* Fix another dangling find tuple
2025-09-09 15:36:25 -07:00
Copilot
2030f94eb4
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation (#2006)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan

* Refactor VectorStoreFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in VectorStoreFactory refactoring

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove backward compatibility support from VectorStoreFactory and StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix and ruff format, add semversioner file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update storage factory design

* Refactor CacheFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* revert copilot changes

* fix copilot changes

* update comments

* Fix failing pytest compatibility for factory tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update class instantiation issue

* ruff fixes

* fix pytest

* add default value

* ruff formatting changes

* ruff fixes

* revert minor changes

* cleanup cache factory

* Update CacheFactory tests to match consistent factory pattern

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update pytest thresholds

* adjust threshold levels

* Add custom vector store implementation notebook

Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:

- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist

The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove sample notebook for now

* update tests

* fix cache pytests

* add pandas-stub to dev dependencies

* disable warning check for well known key

* skip tests when running on ubuntu

* add documentation for custom vector store implementations

* ignore ruff findings in notebooks

* fix merge breakages

* speedup CLI import statements

* remove unnecessary import statements in init file

* Add str type option on storage/cache type

* Fix store name

* Add LoggerFactory

* Fix up logging setup across CLI/API

* Add LoggerFactory test

* Fix err message

* Semver

* Remove enums from factory methods

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00
Copilot
13bf315a35
Refactor StorageFactory class to use registration functionality (#1944)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan for issue

* Refactored StorageFactory to use a registration-based approach

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Added semversioner change record

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix Python CI test failures and improve code quality

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-10 12:08:44 -06:00
Nathan Evans
1df89727c3
Pipeline registration (#1940)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Move covariate run conditional

* All pipeline registration

* Fix method name construction

* Rename context storage -> output_storage

* Rename OutputConfig as generic StorageConfig

* Reuse Storage model under InputConfig

* Move input storage creation out of document loading

* Move document loading into workflows

* Semver

* Fix smoke test config for new workflows

* Fix unit tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-06-12 16:14:39 -07:00
Derek Worthen
c644338bae
Refactor config (#1593)
* Refactor config

- Add new ModelConfig to represent LLM settings
    - Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
    - Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
    - This removes the automatic env variable overrides
- Support env variables within config files using Templating
    - This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure

* Use dict of ModelConfig instead of list

* Add model validations and unit tests

* Fix ruff checks

* Add semversioner change

* Fix unit tests

* validate root_dir in pydantic model

* Rename ModelConfig to LanguageModelConfig

* Rename ModelConfigMissingError to LanguageModelConfigMissingError

* Add validationg for unexpected API keys

* Allow skipping pydantic validation for testing/mocking purposes.

* Add default lm configs to verb tests

* smoke test

* remove config from flows to fix llm arg mapping

* Fix embedding llm arg mapping

* Remove timestamp from smoke test outputs

* Remove unused "subworkflows" smoke test properties

* Add models to smoke test configs

* Update smoke test output path

* Send logs to logs folder

* Fix output path

* Fix csv test file pattern

* Update placeholder

* Format

* Instantiate default model configs

* Fix unit tests for config defaults

* Fix migration notebook

* Remove create_pipeline_config

* Remove several unused config models

* Remove indexing embedding and input configs

* Move embeddings function to config

* Remove skip_workflows

* Remove skip embeddings in favor of explicit naming

* fix unit test spelling mistake

* self.models[model_id] is already a language model. Remove redundant casting.

* update validation errors to instruct users to rerun graphrag init

* instantiate LanguageModelConfigs with validation

* skip validation in unit tests

* update verb tests to use default model settings instead of skipping validation

* test using llm settings

* cleanup verb tests

* remove unsafe default model config

* remove the ability to skip pydantic validation

* remove None union types when default values are set

* move vector_store from embeddings to top level of config and delete resolve_paths

* update vector store settings

* fix vector store and smoke tests

* fix serializing vector_store settings

* fix vector_store usage

* fix vector_store type

* support cli overrides for loading graphrag config

* rename storage to output

* Add --force flag to init

* Remove run_id and resume, fix Drift config assignment

* Ruff

---------

Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-21 17:52:06 -06:00
Josh Bradley
cbb8f8788e
Fix storage class instantiation (#1582) 2025-01-03 17:39:44 -05:00