Derek Worthen
e0cce31f54
Graphrag config ( #2119 )
...
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
* Add load_config to graphrag-common package.
2025-11-10 07:57:03 -08:00
Nathan Evans
6033e4ffa2
Storage fixes and cleanup ( #2118 )
...
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
* Fix pipeline recursion
* Remove base_dir from storage.find
* Remove max_count from storage.find
* Remove prefix on storage integ test
* Add base_dir in creation_date test
* Wrap base_dir in Path
* Use constants for input/update directories
2025-11-05 13:06:09 -08:00
Derek Worthen
619269243d
Restructure project as monorepo. ( #2111 )
...
* Restructure project as monorepo.
2025-11-04 09:51:56 -08:00
Nathan Evans
1bb9fa8e13
Unified factory ( #2105 )
...
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
* Simplify Factory interface
* Migrate CacheFactory to standard base class
* Migrate LoggerFactory to standard base class
* Migrate StorageFactory to standard base class
* Migrate VectorStoreFactory to standard base class
* Update vector store example notebook
* Delete notebook outputs
* Move default providers into factories
* Move retry/limit tests into integ
* Split language model factories
* Set smoke test tpm/rpm
* Fix factory integ tests
* Add method to smoke test, switch text to 'fast'
* Fix text smoke config for fast workflow
* Add new workflows to text smoke test
* Convert input readers to a proper factory
* Remove covariates from fast smoke test
* Update docs for input factory
* Bump smoke runtime
* Even longer runtime
* min-csv timeout
* Remove unnecessary lambdas
2025-10-20 12:05:27 -07:00
gaudyb
0436405962
Remove document overwrite ( #2101 )
...
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
* remove document overwrite from vector store configuration
* remove document overwrite and refactor load documents method
* fix test
* fix test
* fix test
---------
Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-10-16 07:56:54 -06:00
gaudyb
79ad9b96f3
reduce schema fields ( #2089 )
...
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
* reduce schema fields
* fix launch.json
---------
Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-10-09 13:41:31 -06:00
Nathan Evans
2b5284ca1b
Merge branch 'main' into v3/main
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
2025-10-07 16:24:15 -07:00
Nathan Evans
ac8a7f5eef
Housekeeping ( #2086 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add deprecation warnings for fnllm and multi-search
* Fix dangling token_encoder refs
* Fix local_search notebook
* Fix global search dynamic notebook
* Fix global search notebook
* Fix drift notebook
* Switch example notebooks to use LiteLLM config
* Properly annotate dev deps as a group
* Semver
* Remove --extra dev
* Remove llm_model variable
* Ignore ruff ASYNC240
* Add note about expected broken notebook in docs
* Fix custom vector store notebook
* Push tokenizer throughout
2025-10-07 16:21:24 -07:00
gaudyb
d7773bd15c
Clean vector store ( #2077 )
...
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
* clean vector store code
* fix
* fix launch.json
---------
Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-09-25 21:17:10 -06:00
Nathan Evans
b73053010e
Merge branch 'main' into v3/main
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
2025-09-23 11:07:42 -07:00
gaudyb
82cd3b7df2
Custom vector store schema implementation ( #2062 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* progress on vector customization
* fix for lancedb vectors
* cosmosdb implementation
* uv run poe format
* clean test for vector store
* semversioner update
* test_factory.py integration test fixes
* fixes for cosmosdb test
* integration test fix for lancedb
* uv fix for format
* test fixes
* fixes for tests
* fix cosmosdb bug
* print statement
* test
* test
* fix cosmosdb bug
* test validation
* validation cosmosdb
* validate cosmosdb
* fix cosmosdb
* fix small feedback from PR
---------
Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-09-19 10:11:34 -07:00
Nathan Evans
978e79875e
Remove file filtering ( #2050 )
...
* Remove document filtering
* Semver
* Fix integ tests
* Fix file find tuple
* Fix another dangling find tuple
2025-09-09 15:36:25 -07:00
Copilot
2030f94eb4
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation ( #2006 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan
* Refactor VectorStoreFactory to use registration functionality like StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues in VectorStoreFactory refactoring
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove backward compatibility support from VectorStoreFactory and StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Run ruff check --fix and ruff format, add semversioner file
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
* update storage factory design
* Refactor CacheFactory to use registration functionality like StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* revert copilot changes
* fix copilot changes
* update comments
* Fix failing pytest compatibility for factory tests
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update class instantiation issue
* ruff fixes
* fix pytest
* add default value
* ruff formatting changes
* ruff fixes
* revert minor changes
* cleanup cache factory
* Update CacheFactory tests to match consistent factory pattern
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update pytest thresholds
* adjust threshold levels
* Add custom vector store implementation notebook
Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:
- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist
The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove sample notebook for now
* update tests
* fix cache pytests
* add pandas-stub to dev dependencies
* disable warning check for well known key
* skip tests when running on ubuntu
* add documentation for custom vector store implementations
* ignore ruff findings in notebooks
* fix merge breakages
* speedup CLI import statements
* remove unnecessary import statements in init file
* Add str type option on storage/cache type
* Fix store name
* Add LoggerFactory
* Fix up logging setup across CLI/API
* Add LoggerFactory test
* Fix err message
* Semver
* Remove enums from factory methods
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00
Copilot
13bf315a35
Refactor StorageFactory class to use registration functionality ( #1944 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan for issue
* Refactored StorageFactory to use a registration-based approach
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Added semversioner change record
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix Python CI test failures and improve code quality
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-10 12:08:44 -06:00
Copilot
e84df28e64
Improve internal logging functionality by using Python's standard logging module ( #1956 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan for issue
* Implement standard logging module and integrate with existing loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add test cases and improve documentation for standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting and add semversioner file for logging improvements
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove custom logger classes and refactor to use standard logging only
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting to resolve CI/CD test failures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add semversioner file and fix linting issues
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* fix spelling error
* Remove StandardProgressLogger and refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove LoggerFactory and custom loggers, refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright error: use logger.info() instead of calling logger as function in cosmosdb_pipeline_storage.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Remove deprecated logger files that were marked as deprecated placeholders
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace custom get_logger with standard Python logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues found by ruff check --fix
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* add word to dictionary
* Fix type checker error in ModelManager.__new__ method
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor multiple logging.getLogger() calls to use single logger per file
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove progress_logger parameter from build_index() and logger parameter from generate_indexing_prompts()
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove logger parameter from run_pipeline and standardize logger naming
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logger parameter with log_level parameter in CLI commands
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass poetry poe check
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove --logger parameter from smoke test command
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix Windows CI/CD issue with log file cleanup in tests
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add StreamHandler to root logger in __main__.py for CLI logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Only add StreamHandler if root logger doesn't have existing StreamHandler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logging.StreamHandler with colorlog.StreamHandler for colorized log output
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Regenerate poetry.lock file after adding colorlog dependency
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* move printing of dataframes to debug level
* remove colorlog for now
* Refactor workflow callbacks to inherit from logging.Handler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues in workflow callback handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type errors in blob and file workflow callbacks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor pipeline logging to use pure logging.Handler subclasses
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Rename workflow callback classes to workflow logger classes and move to logger directory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update dictionary
* apply ruff fixes
* fix function name
* simplify logger code
* update
* Remove error, warning, and log methods from WorkflowCallbacks and replace with standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Fix pyright errors by removing WorkflowCallbacks from strategy type signatures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove ConsoleWorkflowLogger and apply consistent formatter to all handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Refactor pipeline_logger.py to use standard FileHandler and remove FileWorkflowLogger
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove conditional azure import checks from blob_workflow_logger.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type checking errors in mock_provider.py and utils.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Run ruff check --fix to fix import ordering in notebooks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Merge configure_logging and create_pipeline_logger into init_loggers function
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove configure_logging and create_pipeline_logger functions, replace all usage with init_loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* cleanup unused code
* Update init_loggers to accept GraphRagConfig instead of ReportingConfig
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* Fix test failures by providing valid GraphRagConfig with required model configurations
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* remove logging_workflow_callback
* cleanup logging messages
* Add logging to track progress of pandas DataFrame apply operation in create_base_text_units
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* cleanup logger logic throughout codebase
* update
* more cleanup of old loggers
* small logger cleanup
* final code cleanup and added loggers to query
* add verbose logging to query
* minor code cleanup
* Fix broken unit tests for chunk_text and standard_logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Fix test_chunk_text by mocking progress_ticker function instead of ProgressTicker class
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove unnecessary logger
* remove rich and fix type annotation
* revert test formatting changes my by copilot
* promote graphrag logs to root logger
* add correct semversioner file
* revert change to file
* revert formatting changes that have no effect
* fix changes after merge with main
* revert unnecessary copilot changes
* remove whitespace
* cleanup docstring
* simplify some logic with less code
* update poetry lock file
* ruff fixes
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-09 18:29:03 -06:00
Nathan Evans
1df89727c3
Pipeline registration ( #1940 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Move covariate run conditional
* All pipeline registration
* Fix method name construction
* Rename context storage -> output_storage
* Rename OutputConfig as generic StorageConfig
* Reuse Storage model under InputConfig
* Move input storage creation out of document loading
* Move document loading into workflows
* Semver
* Fix smoke test config for new workflows
* Fix unit tests
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-06-12 16:14:39 -07:00
Alonso Guevara
7fba9522d4
Task/raw model answer ( #1947 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Add full_response to llm provider output
* Semver
* Small leftover cleanup
* Add pyi to suppress Pyright errors. full_content is optional
* Format
* Add missing stubs
2025-05-22 08:22:44 -06:00
Nathan Evans
ad4cdd685f
Support OpenAI reasoning models ( #1841 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Update tiktoken
* Add max_completion_tokens to model config
* Update/remove outdated comments
* Remove max_tokens from report generation
* Remove max_tokens from entity summarization
* Remove logit_bias from graph extraction
* Remove logit_bias from claim extraction
* Swap params if reasoning model
* Add reasoning model support to basic search
* Add reasoning model support for local and global search
* Support reasoning models with dynamic community selection
* Support reasoning models in DRIFT search
* Remove unused num_threads entry
* Semver
* Update openai
* Add reasoning_effort param
2025-04-22 14:15:26 -07:00
KennyZhang1
61769dd47e
Vector Store Integration Tests ( #1856 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add vector store id reference to embeddings config.
* generated initial vector store pytests
* cleaned up cosmosdb vector store test
* fixed class name typo and debugged cosmosdb vector store test
* reset emulator connection string
* remove unneccessary comments
* removed extra comments from azure ai search test
* ruff
* semversioner
* fix cicd issues
* bypass diskANN policy for test env
* handle floating point inprecisions
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
2025-04-01 11:05:04 -04:00
Alonso Guevara
53950f8442
Fix/model provider key injection check ( #1799 )
...
* Check available models for type validation
* Semver
* Fix ruff and pyright
* Apply feedback
2025-03-11 17:48:30 -06:00
Gabriel Nieves-Ponce
e39d869bed
Added support for verbose logging and csv-metadata to the prompt tune… ( #1789 )
...
* Added support for verbose logging and csv-metadata to the prompt tune client.
* Updated community report summarization file name and prompt template
* updated semversioner
* ran ruff linter
* Ran poe format
* Fix Ruff complains
* Fix a new ruff complain :P
* Pyright
* Fix tests
---------
Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-11 14:55:02 -06:00
Alonso Guevara
e0d233fe10
Feat/llm provider query ( #1735 )
...
* Add ModelProvider to Query package.
* Spellcheck + others
* Semver
* Fix tests
* Format
* Fix Pyright
* Fix tests
* Fix for smoke tests
2025-02-24 18:35:51 -06:00
Alonso Guevara
7bdeaee94a
Create Language Model Providers and Registry methods. Remove fnllm coupling ( #1724 )
...
* Base structure
* Add fnllm providers and Mock LLM
* Remove fnllm coupling, introduce llm providers
* Ruff + Tests fix
* Spellcheck
* Semver
* Format
* Default MockChat params
* Fix more tests
* Fix embedding smoke test
* Fix embeddings smoke test
* Fix MockEmbeddingLLM
* Rename LLM to model. Package organization
* Fix prompt tuning
* Oops
* Oops II
2025-02-20 08:56:20 -06:00
Dayenne Souza
b94290ec2b
add option to add metadata into text chunks ( #1681 )
...
* add new options
* add metadata json into input document
* remove doc change
* add metadata column into text loader
* prepend_metadata
* run fix
* fix tests and patch
* fix test
* add watrning for metadata tokens > config size
* fix typo and run fix
* fix test_integration
* fix test
* run check
* rename and fix chunking
* fix
* fix
* fiz test verbs
* fix
* fix tests
* fix chunking
* fix index
* fix cosmos test
* fix vars
* fix after PR
* fix
2025-02-12 09:38:03 -08:00
Derek Worthen
c644338bae
Refactor config ( #1593 )
...
* Refactor config
- Add new ModelConfig to represent LLM settings
- Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
- Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
- This removes the automatic env variable overrides
- Support env variables within config files using Templating
- This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure
* Use dict of ModelConfig instead of list
* Add model validations and unit tests
* Fix ruff checks
* Add semversioner change
* Fix unit tests
* validate root_dir in pydantic model
* Rename ModelConfig to LanguageModelConfig
* Rename ModelConfigMissingError to LanguageModelConfigMissingError
* Add validationg for unexpected API keys
* Allow skipping pydantic validation for testing/mocking purposes.
* Add default lm configs to verb tests
* smoke test
* remove config from flows to fix llm arg mapping
* Fix embedding llm arg mapping
* Remove timestamp from smoke test outputs
* Remove unused "subworkflows" smoke test properties
* Add models to smoke test configs
* Update smoke test output path
* Send logs to logs folder
* Fix output path
* Fix csv test file pattern
* Update placeholder
* Format
* Instantiate default model configs
* Fix unit tests for config defaults
* Fix migration notebook
* Remove create_pipeline_config
* Remove several unused config models
* Remove indexing embedding and input configs
* Move embeddings function to config
* Remove skip_workflows
* Remove skip embeddings in favor of explicit naming
* fix unit test spelling mistake
* self.models[model_id] is already a language model. Remove redundant casting.
* update validation errors to instruct users to rerun graphrag init
* instantiate LanguageModelConfigs with validation
* skip validation in unit tests
* update verb tests to use default model settings instead of skipping validation
* test using llm settings
* cleanup verb tests
* remove unsafe default model config
* remove the ability to skip pydantic validation
* remove None union types when default values are set
* move vector_store from embeddings to top level of config and delete resolve_paths
* update vector store settings
* fix vector store and smoke tests
* fix serializing vector_store settings
* fix vector_store usage
* fix vector_store type
* support cli overrides for loading graphrag config
* rename storage to output
* Add --force flag to init
* Remove run_id and resume, fix Drift config assignment
* Ruff
---------
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-21 17:52:06 -06:00
Josh Bradley
cbb8f8788e
Fix storage class instantiation ( #1582 )
2025-01-03 17:39:44 -05:00
Nathan Evans
a2647da473
Simplify flow config ( #1554 )
...
* Flatten compute_communities config
* Remove cluster strategy type
* Flatten create_base_text_units config
* Move cluster seed to config default, leave as None in functions
* Remove "prechunked" logic
* Remove hard-coded encoding model
* Remove unused variables
* Strongly type embed_config
* Simplify layout_graph config
* Semver
* Fix integration test
* Fix config unit tests: ignore new config defaults
* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
KennyZhang1
8368b12532
Add Cosmos DB storage/cache option ( #1431 )
...
* added cosmosdb constructor and database methods
* added rest of abstract method headers
* added cosmos db container methods
* implemented has and delete methods
* finished implementing abstract class methods
* integrated class into storage factory
* integrated cosmosdb class into cache factory
* added support for new config file fields
* replaced primary key cosmosdb initialization with connection strings
* modified cosmosdb setter to require json
* Fix non-default emitters
* Format
* Ruff
* ruff
* first successful run of cosmosdb indexing
* removed extraneous container_name setting
* require base_dir to be typed as str
* reverted merged changed from closed branch
* removed nested try statement
* readded initial non-parquet emitter fix
* added basic support for parquet emitter using internal conversions
* merged with main and resolved conflicts
* fixed more merge conflicts
* added cosmosdb functionality to query pipeline
* tested query for cosmosdb
* collapsed cosmosdb schema to use minimal containers and databases
* simplified create_database and create_container functions
* ruff fixes and semversioner
* spellcheck and ci fixes
* updated pyproject toml and lock file
* apply fixes after merge from main
* add temporary comments
* refactor cache factory
* refactored storage factory
* minor formatting
* update dictionary
* fix spellcheck typo
* fix default value
* fix pydantic model defaults
* update pydantic models
* fix init_content
* cleanup how factory passes parameters to file storage
* remove unnecessary output file type
* update pydantic model
* cleanup code
* implemented clear method
* fix merge from main
* add test stub for cosmosdb
* regenerate lock file
* modified set method to collapse parquet rows
* modified get method to collapse parquet rows
* updated has and delete methods and docstrings to adhere to new schema
* added prefix helper function
* replaced delimiter for prefixed id
* verified empty tests are passing
* fix merges from main
* add find test
* update cicd step name
* tested querying for new schema
* resolved errors from merge conflicts
* refactored set method to handle cache in new schema
* refactored get method to handle cache in new schema
* force unique ids to be written to cosmos for nodes
* found bug with has and delete methods
* modified has and delete to work with cache in new schema
* fix the merge from main
* minor typo fixes
* update lock file
* spellcheck fix
* fix init function signature
* minor formatting updates
* remove https protocol
* change localhost to 127.0.0.1 address
* update pytest to use bacj engine
* verified cache tests
* improved speed of has function
* resolved pytest error with find function
* added test for child method
* make container_name variable private as _container_name
* minor variable name fix
* cleanup cosmos pytest and make the cosmosdb storage class operations more efficient
* update cicd to use different cosmosdb emulator
* test with http protocol
* added pytest for clear()
* add longer timeout for cosmosdb emulator startup
* revert http connection back to https
* add comments to cicd code for future dev usage
* set to container and database clients to none upon deletion
* ruff changes
* add comments to cicd code
* removed unneeded None statements and ruff fixes
* more ruff fixes
* Update test_run.py
* remove unnecessary call to delete container
* ruff format updates
* Reverted test_run.py
* fix ruff formatter errors
* cleanup variable names to be more consistent
* remove extra semversioner file
* revert pydantic model changes
* revert pydantic model change
* revert pydantic model change
* re-enable inline formatting rule
* update documentation in dev guide
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:43:21 -06:00
Nathan Evans
1d68af308b
Community workflow ( #1495 )
...
* Create separate communities workflow
* Add test for new workflow
* Rename workflows
* Collapse subflows into parents
* Rename flows, reuse variables
* Semver
* Fix integration test
* Fix smoke tests
* Fix megapipeline format
* Rename missed files
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-11 15:41:16 -06:00
Alonso Guevara
1c3b0f34c3
Chore/lib updates ( #1477 )
...
* Update dependencies and fix issues
* Format
* Semver
* Fix Pyright
* Pyright
* More Pyright
* Pyright
2024-12-06 14:08:24 -06:00
Nathan Evans
d17dfd01f9
Graph collapse ( #1464 )
...
* Refactor graph creation
* Semver
* Spellcheck
* Update integ pipeline
* Fix cast
* Improve pandas chaining
* Cleaner apply
* Use list comprehensions
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 11:57:26 -06:00
Josh Bradley
dad2176b3c
Miscellaneous code cleanup procedures ( #1452 )
2024-11-27 13:27:43 -05:00
Nathan Evans
634e3ed62a
Transient entity graph ( #1349 )
...
* Make base_entity_graph transient
* Add transient snapshots
* Semver
* Fix unit test
* Fix smoke tests
2024-11-04 17:23:29 -08:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows ( #1284 )
...
* Copy base documents logic into final documents
* Delete create_base_documents
* Combine graph creation under create_base_entity_graph
* Delete collapsed workflows
* Migrate most graph internals to nx.Graph
* Fix None edge case
* Semver
* Remove comment typo
* Fix smoke tests
2024-10-15 13:58:58 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs ( #1248 )
...
* Remove genid
* Move snapshot_rows
* Move snapshot
* Delete spread_json
* Delete unzip
* Delete zip
* Move unpack_graph
* Move compute_edge_combined_degree
* Delete create_graph
* Delete concat
* Delete text replace
* Delete text_translate
* Move text_split
* Inline aggregate override
* Move cluster_graph
* Move merge_graphs
* Semver
* Move text_chunk
* Move layout_graph and fix some __init__s
* Move extract_covariates
* Rename text_split -> split_text
* Move extract_entities
* Move summarize_descriptions
* Rename text_chunk -> chunk_text
* Move community report creation
* Remove verb-level packing operators
* Streamline some naming
* Streamline param name/order
* Move mock LLM data to tests
* Fixed missed rename
* Update some strategy refs
* Rename run_gi
* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Nathan Evans
f5b4d2fea5
Ci streamline ( #988 )
...
* Remove excess vars from gh-pages build
* Delete redundant javascript ci
* Pull apart testing CI
* Clean up integration tests build
* Move storage tests to integration CI
* Take py 3.10 out of smoke tests matrix
* Use minimum supported python version for most tests
* Re-run main CI on any test change
* Add Josh and Kenny to author list
* Update auto-resolve perms
2024-08-21 15:16:15 -06:00
Alonso Guevara
81b81cf60b
Initial Release
2024-07-01 15:25:30 -06:00