Nathan Evans
ac8a7f5eef
Housekeeping ( #2086 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add deprecation warnings for fnllm and multi-search
* Fix dangling token_encoder refs
* Fix local_search notebook
* Fix global search dynamic notebook
* Fix global search notebook
* Fix drift notebook
* Switch example notebooks to use LiteLLM config
* Properly annotate dev deps as a group
* Semver
* Remove --extra dev
* Remove llm_model variable
* Ignore ruff ASYNC240
* Add note about expected broken notebook in docs
* Fix custom vector store notebook
* Push tokenizer throughout
2025-10-07 16:21:24 -07:00
Nathan Evans
2bd3922d8d
Litellm auth fix ( #2083 )
...
* Fix scope for Azure auth with LiteLLM
* Change internal language on max_attempts to max_retries
* Rework model config connectivity validation
* Semver
* Swtich smoke tests to LiteLLM
* Take out temporary retry_strategy = none since it is not fnllm compatible
* Bump smoke test timeout
* Bump smoke timeout further
* Tune smoke params
* Update smoke test bounds
* Remove covariates from min-csv smoke
* Smoke: adjust communities, remove drift
* Remove secrets where they aren't necessary
* Clean out old env var references
2025-10-06 10:54:21 -07:00
Nathan Evans
7f996cf584
Docs/2.6.0 ( #2070 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add basic search to overview
* Add info on input documents DataFrame
* Add info on factories to docs
* Add consumption warning and switch to "christmas" for folder name
* Add logger to factories list
* Add litellm docs. (#2058 )
* Fix version for input docs
* Spelling
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
2025-09-23 14:48:28 -07:00
Nathan Evans
075cadd59a
Clarify managed auth setup in Azure documentation ( #2064 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
Updated instructions for using managed auth on Azure.
2025-09-18 14:58:09 -07:00
Chenghua Duan
a398cc38bb
Update command to use no-discover-entity-types ( #2038 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
"no-entity-types" is an incorrect configuration parameter.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-09-09 16:46:06 -06:00
Nathan Evans
1cb20b66f5
Input docs API parameter ( #2034 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add optional input_documents to index API
* Semver
* Add input dataframe example notebook
* Format
* Fix docs and notebook
2025-09-02 16:15:50 -07:00
Copilot
2030f94eb4
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation ( #2006 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan
* Refactor VectorStoreFactory to use registration functionality like StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues in VectorStoreFactory refactoring
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove backward compatibility support from VectorStoreFactory and StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Run ruff check --fix and ruff format, add semversioner file
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
* update storage factory design
* Refactor CacheFactory to use registration functionality like StorageFactory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* revert copilot changes
* fix copilot changes
* update comments
* Fix failing pytest compatibility for factory tests
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update class instantiation issue
* ruff fixes
* fix pytest
* add default value
* ruff formatting changes
* ruff fixes
* revert minor changes
* cleanup cache factory
* Update CacheFactory tests to match consistent factory pattern
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update pytest thresholds
* adjust threshold levels
* Add custom vector store implementation notebook
Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:
- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist
The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove sample notebook for now
* update tests
* fix cache pytests
* add pandas-stub to dev dependencies
* disable warning check for well known key
* skip tests when running on ubuntu
* add documentation for custom vector store implementations
* ignore ruff findings in notebooks
* fix merge breakages
* speedup CLI import statements
* remove unnecessary import statements in init file
* Add str type option on storage/cache type
* Fix store name
* Add LoggerFactory
* Fix up logging setup across CLI/API
* Add LoggerFactory test
* Fix err message
* Semver
* Remove enums from factory methods
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00
Copilot
7c28c70d5c
Switch from Poetry to uv for package management ( #2008 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan
* Switch from Poetry to uv for package management
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Clean up build artifacts and update gitignore
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove build artifacts
* remove hardcoded version string
* fix calls to pip in cicd
* Update gh-pages.yml workflow to use uv instead of Poetry
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff formatting fixes
* update cicd workflow with latest uv action
* fix command to retrieve package version
* update development instructions
* remove Poetry references
* Replace deprecated azuright action with npm-based Azurite installation
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* skip api version check for azurite
* add semversioner file
* update more changes from switching to UV
* Migrate unified-search-app from Poetry to uv package management
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* minor typo update
* minor Dockerfile update
* update cicd thresholds
* update pytest thresholds
* ruff fixes
* ruff fixes
* remove legacy npm settings that no longer apply
* Update Unified Search App Readme
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-08-13 18:57:25 -06:00
Copilot
e84df28e64
Improve internal logging functionality by using Python's standard logging module ( #1956 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan for issue
* Implement standard logging module and integrate with existing loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add test cases and improve documentation for standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting and add semversioner file for logging improvements
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove custom logger classes and refactor to use standard logging only
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting to resolve CI/CD test failures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add semversioner file and fix linting issues
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* fix spelling error
* Remove StandardProgressLogger and refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove LoggerFactory and custom loggers, refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright error: use logger.info() instead of calling logger as function in cosmosdb_pipeline_storage.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Remove deprecated logger files that were marked as deprecated placeholders
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace custom get_logger with standard Python logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues found by ruff check --fix
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* add word to dictionary
* Fix type checker error in ModelManager.__new__ method
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor multiple logging.getLogger() calls to use single logger per file
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove progress_logger parameter from build_index() and logger parameter from generate_indexing_prompts()
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove logger parameter from run_pipeline and standardize logger naming
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logger parameter with log_level parameter in CLI commands
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass poetry poe check
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove --logger parameter from smoke test command
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix Windows CI/CD issue with log file cleanup in tests
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add StreamHandler to root logger in __main__.py for CLI logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Only add StreamHandler if root logger doesn't have existing StreamHandler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logging.StreamHandler with colorlog.StreamHandler for colorized log output
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Regenerate poetry.lock file after adding colorlog dependency
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* move printing of dataframes to debug level
* remove colorlog for now
* Refactor workflow callbacks to inherit from logging.Handler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues in workflow callback handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type errors in blob and file workflow callbacks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor pipeline logging to use pure logging.Handler subclasses
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Rename workflow callback classes to workflow logger classes and move to logger directory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update dictionary
* apply ruff fixes
* fix function name
* simplify logger code
* update
* Remove error, warning, and log methods from WorkflowCallbacks and replace with standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Fix pyright errors by removing WorkflowCallbacks from strategy type signatures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove ConsoleWorkflowLogger and apply consistent formatter to all handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Refactor pipeline_logger.py to use standard FileHandler and remove FileWorkflowLogger
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove conditional azure import checks from blob_workflow_logger.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type checking errors in mock_provider.py and utils.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Run ruff check --fix to fix import ordering in notebooks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Merge configure_logging and create_pipeline_logger into init_loggers function
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove configure_logging and create_pipeline_logger functions, replace all usage with init_loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* cleanup unused code
* Update init_loggers to accept GraphRagConfig instead of ReportingConfig
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* Fix test failures by providing valid GraphRagConfig with required model configurations
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* remove logging_workflow_callback
* cleanup logging messages
* Add logging to track progress of pandas DataFrame apply operation in create_base_text_units
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* cleanup logger logic throughout codebase
* update
* more cleanup of old loggers
* small logger cleanup
* final code cleanup and added loggers to query
* add verbose logging to query
* minor code cleanup
* Fix broken unit tests for chunk_text and standard_logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Fix test_chunk_text by mocking progress_ticker function instead of ProgressTicker class
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove unnecessary logger
* remove rich and fix type annotation
* revert test formatting changes my by copilot
* promote graphrag logs to root logger
* add correct semversioner file
* revert change to file
* revert formatting changes that have no effect
* fix changes after merge with main
* revert unnecessary copilot changes
* remove whitespace
* cleanup docstring
* simplify some logic with less code
* update poetry lock file
* ruff fixes
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-09 18:29:03 -06:00
Nathan Evans
27c6de846f
Update docs for 2.0+ ( #1984 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Update docs
* Fix prompt links
2025-06-23 13:49:47 -07:00
Nathan Evans
36948b8d2e
Various minor updates ( #1932 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add text unit ids to Community model
* Add graph utilities
* Turn off LCC for clustering by default
* Simplify embeddings config/flow
* Semver
2025-05-16 14:48:53 -07:00
Nathan Evans
25bbae8642
Docs: Add models page ( #1842 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add models page
* Update config docs for new params
* Spelling
* Add comment on CoT with o-series
* Add notes about managed identity
* Update the viz guide
* Spruce up the getting started wording
* Capitalization
* Add BYOG page
* More BYOG edits
* Update dictionary
* Change example model name
2025-04-28 17:36:08 -07:00
Nathan Evans
56e0fad218
NLP graph parity ( #1888 )
...
* Update stopwords config
* Minor edits
* Update PMI
* Format
* Perf improvements
* Semver
* Remove edge collection apply
* Remove source/target apply
* Add edge weight to graph snapshot
* Revert breaking optimizations
* Add perf fixes back in
* Format/types
* Update defaults
* Fix source/target ordering
* Fix test
2025-04-25 17:09:06 -06:00
Nathan Evans
3b1e70c06b
Update config docs (2.1.0) ( #1818 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Align docs with config
* Semver
* Spelling
* Format
* Spelling
2025-03-18 12:39:30 -07:00
Nathan Evans
ddc6541ab6
Add docs page about input formats ( #1784 )
...
* Add docs page about input formats
* Add json example
* Spelling
2025-03-11 17:37:46 -07:00
Nathan Evans
321d479ab6
Update notebooks for 2.0 ( #1785 )
...
* Update API overview
* Fix global search example
* Fix local search example
* Fix global dynamic example
* Fix drift example
* Update multi-index example
* Semver
2025-03-11 17:23:49 -07:00
Nathan Evans
bcb74789f1
Next release docs ( #1627 )
...
* Wordind updates
* Update yam lconfig and add notes to "deprecated" env
* Add basic search section
* Update versioning docs
* Minor edits for clarity
* Update init command
* Update init to add --force in docs
* Add NLP extraction params
* Move vector_store to root
* Add workflows to config
* Add FastGraphRAG docs
* add metadata column changes
* Added documentation for multi index search.
* Minor fixes.
* Add config and table renames
* Update migration notebook and comments to specify v1
* Add frequency to entity table docs
* add new chunking options for metadata
* Update output docs
* Minor edits and cleanup
* Add model ids to search configs
* Spruce up migration notebook
* Lint/format multi-index notebook
* SpaCy model note
* Update SpaCy footnote
* Updated multi_index_search.ipynb to remove ruff errors.
* add spacy to dictionary
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Dayenne Souza <ddesouza@microsoft.com>
Co-authored-by: dorbaker <dorbaker@microsoft.com>
2025-03-03 14:46:00 -08:00
Nathan Evans
61a309b182
Incremental model alignment ( #1766 )
...
* Used shared schema lists for all final columns
* Semver
2025-02-25 13:14:42 -06:00
Nathan Evans
981fd31963
Community children ( #1704 )
...
* Add children to the community tables
* Replace NaN children with empty list
* Replace subcommunity logic with built-in parent/child fields
* Remove restore_community_hierarchy
* Add children and frequency to migration notebook
* Format
* Semver
* Add children to reports
* Update tests
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-13 17:03:51 -08:00
Josh Bradley
b8b949f3bb
Cleanup query api - remove code duplication ( #1690 )
...
* consolidate query api functions and remove code duplication
* refactor and remove more code duplication
* Add semversioner file
* fix basic search
* fix drift search and update base class function names
* update example notebooks
2025-02-13 16:31:08 -05:00
Nathan Evans
c02ab0984a
Streamline workflows ( #1674 )
...
* Remove create_final_nodes
* Rename final entity output to "entities"
* Remove duplicate code from graph extraction
* Rename create_final_relationships output to "relationships"
* Rename create_final_communities output to "communities"
* Combine compute_communities and create_final_communities
* Rename create_final_covariates output to "covariates"
* Rename create_final_community_reports output to "community_reports"
* Rename create_final_text_units output to "text_units"
* Rename create_final_documents output to "documents"
* Remove transient snapshots config
* Move create_final_entities to finalize_entities operation
* Move create_final_relationships flow to finalize_relationships operation
* Reuse some community report functions
* Collapse most of graph and text unit-based report generation
* Unify schemas files
* Move community reports extractor
* Move NLP report prompt to prompts folder
* Fix a few pandas warnings
* Rename embeddings config to embed_text
* Rename claim_extraction config to extract_claims
* Remove nltk from standard graph extraction
* Fix verb tests
* Fix extract graph config naming
* Fix moved file reference
* Create v1-to-v2 migration notebook
* Semver
* Fix smoke test artifact count
* Raise tpm/rpm on smoke tests
* Update drift settings for smoke tests
* Reuse project directory var in api notebook
* Format
* Format
2025-02-07 11:11:03 -08:00
JunHo Kim (김준호)
30f36316af
Fix typo in table formatting in env_vars documentation ( #1632 )
...
Corrected a missing backtick in a note within the `GRAPHRAG_API_KEY` description. This ensures proper code formatting and improves readability in the documentation. No content was altered aside from formatting adjustments.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-02-04 13:14:58 -08:00
Shamik
053bf60162
Update auto_prompt_tuning.md ( #1659 )
...
Updated the auto prompt tuning doc with `--selection-method` instead of only `--method` as per the latest API.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-27 13:33:25 -06:00
Derek Worthen
c644338bae
Refactor config ( #1593 )
...
* Refactor config
- Add new ModelConfig to represent LLM settings
- Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
- Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
- This removes the automatic env variable overrides
- Support env variables within config files using Templating
- This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure
* Use dict of ModelConfig instead of list
* Add model validations and unit tests
* Fix ruff checks
* Add semversioner change
* Fix unit tests
* validate root_dir in pydantic model
* Rename ModelConfig to LanguageModelConfig
* Rename ModelConfigMissingError to LanguageModelConfigMissingError
* Add validationg for unexpected API keys
* Allow skipping pydantic validation for testing/mocking purposes.
* Add default lm configs to verb tests
* smoke test
* remove config from flows to fix llm arg mapping
* Fix embedding llm arg mapping
* Remove timestamp from smoke test outputs
* Remove unused "subworkflows" smoke test properties
* Add models to smoke test configs
* Update smoke test output path
* Send logs to logs folder
* Fix output path
* Fix csv test file pattern
* Update placeholder
* Format
* Instantiate default model configs
* Fix unit tests for config defaults
* Fix migration notebook
* Remove create_pipeline_config
* Remove several unused config models
* Remove indexing embedding and input configs
* Move embeddings function to config
* Remove skip_workflows
* Remove skip embeddings in favor of explicit naming
* fix unit test spelling mistake
* self.models[model_id] is already a language model. Remove redundant casting.
* update validation errors to instruct users to rerun graphrag init
* instantiate LanguageModelConfigs with validation
* skip validation in unit tests
* update verb tests to use default model settings instead of skipping validation
* test using llm settings
* cleanup verb tests
* remove unsafe default model config
* remove the ability to skip pydantic validation
* remove None union types when default values are set
* move vector_store from embeddings to top level of config and delete resolve_paths
* update vector store settings
* fix vector store and smoke tests
* fix serializing vector_store settings
* fix vector_store usage
* fix vector_store type
* support cli overrides for loading graphrag config
* rename storage to output
* Add --force flag to init
* Remove run_id and resume, fix Drift config assignment
* Ruff
---------
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-21 17:52:06 -06:00
Alonso Guevara
e21a38f2ab
Fix/notebooks ( #1614 )
...
* Add new inputs and missing vector store for retrieving vectors
* Format
* Semver
* Remove .Identifier files
* Fix spellcheck
* Remove unnecessary input file for notebooks
2025-01-13 17:41:39 -06:00
Nathan Evans
0e7d22bfb0
Jan documentation updates ( #1612 )
...
* Update workflow docs
* Docs cleanup
2025-01-10 11:36:27 -08:00
Nathan Evans
7ec9ef0261
Refactor callbacks ( #1583 )
...
* Unify Workflow and Verb callbacks interfaces
* Semver
* Fix storage class instantiation (#1582 )
---------
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-01-06 10:58:59 -08:00
Nathan Evans
a35cb12741
Remove datashaper strip code ( #1581 )
...
Remove datashaper
2025-01-03 13:59:26 -08:00
Alonso Guevara
2abd6c5f5c
Update blog posts ( #1571 )
2024-12-30 17:16:08 -06:00
Josh Bradley
983664397b
Update doc site with api overview notebook ( #1509 )
...
update doc site
2024-12-12 16:08:24 -05:00
Josh Bradley
823342188d
Cleanup factory methods ( #1482 )
...
* cleanup factory methods to have similar design pattern across codebase
* add semversioner file
* cleanup logging factory
* update developer guide
* add comment
* typo fix
* cleanup reporter terminology
* renmae reporter to logger
* fix comments
* update comment
* instantiate factory classes correctly and update index api callback parameter
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-10 16:11:11 -06:00
Alonso Guevara
04405803db
Add Parent to communities in data model ( #1491 )
...
* Add Parent to communities in data model
* Semver
* Pyright
* Update docs
* Use leiden cluster parent id
* Format
2024-12-10 14:38:11 -06:00
Nathan Evans
61816e076f
Migration notebook ( #1492 )
...
* Add migration notebook
* Update migration instructions
* Semver
* Rename item in relationships table
* Remove indexing vector store shim
* Remove query shims
* Remove columns from migrated data
* Format
* Add community parents
2024-12-10 14:23:26 -06:00
Josh Bradley
b00142260d
Update index API + a notebook that provides a general API overview ( #1454 )
...
* update index api to accept callbacks
* fix hardcoded folder name that was creating an empty folder
* add API notebook
* add semversioner file
* filename change
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 15:34:21 -06:00
Nathan Evans
d17dfd01f9
Graph collapse ( #1464 )
...
* Refactor graph creation
* Semver
* Spellcheck
* Update integ pipeline
* Fix cast
* Improve pandas chaining
* Cleaner apply
* Use list comprehensions
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 11:57:26 -06:00
Josh Bradley
dad2176b3c
Miscellaneous code cleanup procedures ( #1452 )
2024-11-27 13:27:43 -05:00
Nathan Evans
0b2120ca45
Docs and notebooks update ( #1451 )
...
* Fix local question gen and example notebook
* Update global search notebook
* Add lazy blog post
* Update breaking changes doc for migration notes
* Simplify Getting Started page
* Semver
* Spellcheck
* Fix types
* Add comments on cache-free migration
* Update wording
* Spelling
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-27 09:56:48 -08:00
Josh Bradley
22a57d14c7
Improve CLI speed with lazy imports ( #1319 )
2024-11-15 19:41:10 -05:00
Nathan Evans
9b4f24ebce
First cut at config cleanup ( #1411 )
...
* Firsst cut at config cleanup
* Reorder top nav
* Add query prompts to tuning page
* Remove dynamic notebook from nav
* Add more thorough yml config descriptions in docs
* Further clean out the config
* Semver
* Add new blog post
* Emphasize yaml
* Clarify output
* Fix unit test
* Fix bullet nesting
2024-11-15 14:33:26 -08:00
Nathan Evans
425dbc60e3
Docs update ( #1408 )
...
* Fix footer contrast
* Fix broken links
* Remove a few unneeded examples
* Point python API example to the whole folder
* Convert schema bullets to tables
2024-11-14 21:26:29 -06:00
JunHo Kim (김준호)
ec9cdcce4d
fix typo. Correct the wording "global search" to "drift search" in drift search documentation ( #1383 )
...
Updated the wording of the example scenario from "global search" to "drift search" to accurately reflect the topic. This improves clarity and ensures the documentation accurately describes its content.
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-11-14 16:55:44 -06:00
Nathan Evans
51912b2e03
Move prompts ( #1404 )
...
* Move indexing prompts to root
* Move query prompts to root
* Export query prompts during init
* Extract general knowledge prompt
* Load query prompts from disk
* Semver
* Fix unit tests
2024-11-14 10:45:37 -08:00
Nathan Evans
c8c354e357
Artifact cleanup ( #1341 )
...
* Add source documents for verb tests
* Remove entity_type erroneous column
* Add new test data
* Remove source/target degree columns
* Remove top_level_node_id
* Remove chunk column configs
* Rename "chunk" to "text"
* Rename "chunk" to "text" in base
* Re-map document input to use base text units
* Revert base text units as final documents dep
* Update test data
* Split/rename node source_id
* Drop node size (dup of degree)
* Drop document_ids from covariates
* Remove unused document_ids from models
* Remove n_tokens from covariate table
* Fix missed document_ids delete
* Wire base text units to final documents
* Rename relationship rank as combined_degree
* Add rank as first-class property to Relationship
* Remove split_text operation
* Fix relationships test parquet
* Update test parquets
* Add entity ids to community table
* Remove stored graph embedding columns
* Format
* Semver
* Fix JSON typo
* Spelling
* Rename lancedb
* Sort lancedb
* Fix unit test
* Fix test to account for changing period
* Update tests for separate embeddings
* Format
* Better assertion printing
* Fix unit test for windows
* Rename document.raw_content -> document.text
* Remove read_documents function
* Remove unused document summary from model
* Remove unused imports
* Format
* Add new snapshots to default init
* Use util to construct embeddings collection name
* Align inc index model with branch changes
* Update data and tests for int ids
* Clean up embedding locs
* Switch entity "name" to "title" for consistency
* Fix short_id -> human_readable_id defaults
* Format
* Rework community IDs
* Fix community size compute
* Fix unit tests
* Fix report read
* Pare down nodes table output
* Fix unit test
* Fix merge
* Fix community loading
* Format
* Fix community id report extraction
* Update tests
* Consistent short IDs and ordering
* Update ordering and tests
* Update incremental for new nodes model
* Guard document columns loc
* Match column ordering
* Fix document guard
* Update smoke tests
* Fill NA on community extract
* Logging for smoke test debug
* Add parquet schema details doc
* Fix community hierarchy guard
* Use better empty hierarchy guard
* Back-compat shims
* Semver
* Fix warning
* Format
* Remove default fallback
* Reuse key
2024-11-13 15:11:19 -08:00
Alonso Guevara
e53422366d
Implement dynamic community selection for global search ( #1396 )
...
* update gitignore
* add dynamic community sleection to updated main branch
* update SearchResult to record output_tokens.
* update search result
* dynamic search working
* format
* add llm_calls_categories and prompt_tokens and output_tokens cate
* update
* formatting
* log drift search output and prompt tokens separately
* update global_search.ipynb. update operate dulce dataset and add create_final_communities. update dynamic community selection init
* add .ipynb back to cspell.config.yaml
* format
* add notebook example on dynamic search
* rearrange
* update gitignore
* format code
* code format
* code format
* fix default variable
---------
Co-authored-by: Bryan Li <bryanlimy@gmail.com>
2024-11-11 16:45:07 -08:00
Josh Bradley
a8ccded83c
Fix file path issue in the viz guide ( #1372 )
...
* Fix a file paths issue in the viz guide.
* fix formatting
2024-11-06 14:42:07 -08:00
Alonso Guevara
2047c1561c
Fix styling and misalignment on drift docs ( #1373 )
2024-11-06 16:29:53 -06:00
Josh Bradley
9762f33c1a
Add visualization guide ( #1340 )
2024-11-06 14:06:50 -05:00
Alonso Guevara
1557ce34f9
Fix init defaults for vector store and img in drift docs ( #1357 )
...
* Fix init defaults for vector store and img in drift docs
* Adde more doc
* Spellcheck
* Remove example
2024-11-05 14:14:17 -06:00
Alonso Guevara
d9f985ae52
Drift Search CLI, API, Docs and Example Notebook ( #1348 )
...
* Drift CLI and backwards compat
* Adding DRIFT Cli, Docs and example notebook
* Update tests and fix ruff
* Format
* Small cleanup
* Fix smoke tests
* Update notebook
* Oopsie fix
* Delete duplicate img
2024-11-05 12:05:19 -06:00
Nathan Evans
634e3ed62a
Transient entity graph ( #1349 )
...
* Make base_entity_graph transient
* Add transient snapshots
* Semver
* Fix unit test
* Fix smoke tests
2024-11-04 17:23:29 -08:00