Commit Graph

77 Commits

Author SHA1 Message Date
Nathan Evans
eb0dfe376b
Remove strategy dicts (#2090)
* Remove "strategy" from community reports config/workflow

* Remove extraction strategy from extract_graph

* Remove summarization strategy from extract_graph

* Remove strategy from claim extraction

* Strongly type prompt templates

* Remove strategy from embed_text

* Push hydrated params into community report workflows

* Push hyrdated params into extract covariates

* Push hydrated params into extract graph NLP

* Push hydrated params into extract graph

* Push hydrated params into text embeddings

* Remove a few more low-level defaults

* Semver

* Remove configurable prompt delimiters

* Update smoke tests
2025-10-10 12:15:23 -07:00
gaudyb
79ad9b96f3
reduce schema fields (#2089)
Some checks are pending
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
* reduce schema fields

* fix launch.json

---------

Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-10-09 13:41:31 -06:00
Nathan Evans
2b5284ca1b Merge branch 'main' into v3/main
Some checks failed
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
2025-10-07 16:24:15 -07:00
Nathan Evans
ac8a7f5eef
Housekeeping (#2086)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add deprecation warnings for fnllm and multi-search

* Fix dangling token_encoder refs

* Fix local_search notebook

* Fix global search dynamic notebook

* Fix global search notebook

* Fix drift notebook

* Switch example notebooks to use LiteLLM config

* Properly annotate dev deps as a group

* Semver

* Remove --extra dev

* Remove llm_model variable

* Ignore ruff ASYNC240

* Add note about expected broken notebook in docs

* Fix custom vector store notebook

* Push tokenizer throughout
2025-10-07 16:21:24 -07:00
Nathan Evans
4364d678dd Merge branch 'main' into v3/main
Some checks failed
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
2025-10-06 12:41:12 -07:00
Nathan Evans
6c86b0a7bb
Init config cleanup (#2084)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Spruce up init_config output, including LiteLLM default

* Remove deployment_name requirement for Azure

* Semver

* Add model_provider

* Add default model_provider

* Remove OBE test

* Update minimal config for tests

* Add model_provider to verb tests
2025-10-06 12:06:41 -07:00
Nathan Evans
2bd3922d8d
Litellm auth fix (#2083)
* Fix scope for Azure auth with LiteLLM

* Change internal language on max_attempts to max_retries

* Rework model config connectivity validation

* Semver

* Swtich smoke tests to LiteLLM

* Take out temporary retry_strategy = none since it is not fnllm compatible

* Bump smoke test timeout

* Bump smoke timeout further

* Tune smoke params

* Update smoke test bounds

* Remove covariates from min-csv smoke

* Smoke: adjust communities, remove drift

* Remove secrets where they aren't necessary

* Clean out old env var references
2025-10-06 10:54:21 -07:00
Nathan Evans
b73053010e Merge branch 'main' into v3/main
Some checks are pending
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
2025-09-23 11:07:42 -07:00
Derek Worthen
2b70e4a4f3
Tokenizer (#2051)
* Add LiteLLM chat and embedding model providers.

* Fix code review findings.

* Add litellm.

* Fix formatting.

* Update dictionary.

* Update litellm.

* Fix embedding.

* Remove manual use of tiktoken and replace with
Tokenizer interface. Adds support for encoding
and decoding the models supported by litellm.

* Update litellm.

* Configure litellm to drop unsupported params.

* Cleanup semversioner release notes.

* Add num_tokens util to Tokenizer interface.

* Update litellm service factories.

* Cleanup litellm chat/embedding model argument assignment.

* Update chat and embedding type field for litellm use and future migration away from fnllm.

* Flatten litellm service organization.

* Update litellm.

* Update litellm factory validation.

* Flatten litellm rate limit service organization.

* Update rate limiter - disable with None/null instead of 0.

* Fix usage of get_tokenizer.

* Update litellm service registrations.

* Add jitter to exponential retry.

* Update validation.

* Update validation.

* Add litellm request logging layer.

* Update cache key.

* Update defaults.

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-09-22 13:55:14 -06:00
gaudyb
82cd3b7df2
Custom vector store schema implementation (#2062)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* progress on vector customization

* fix for lancedb vectors

* cosmosdb implementation

* uv run poe format

* clean test for vector store

* semversioner update

* test_factory.py integration test fixes

* fixes for cosmosdb test

* integration test fix for lancedb

* uv fix for format

* test fixes

* fixes for tests

* fix cosmosdb bug

* print statement

* test

* test

* fix cosmosdb bug

* test validation

* validation cosmosdb

* validate cosmosdb

* fix cosmosdb

* fix small feedback from PR

---------

Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-09-19 10:11:34 -07:00
Nathan Evans
04d9f585c8
Re-implement hierarchical Leiden (#2049)
Some checks failed
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
* Use graspologic-native hierarchical leiden

* Re-implement largest_connected_component

* Copy in modularity

* Use graspologic-native directly in pyproject

* Remove directed graph tests (we don't use this)

* Semver

* Remove graspologic dep
2025-09-09 16:26:07 -07:00
Nathan Evans
97704ab933
Remove text unit grouping (#2052)
* Remove text unit group_by_columns

* Semver

* Fix default token split test

* Fix models in config test samples

* Fix token length in context sort test

* Fix document sort
2025-09-09 16:04:04 -07:00
Nathan Evans
978e79875e
Remove file filtering (#2050)
* Remove document filtering

* Semver

* Fix integ tests

* Fix file find tuple

* Fix another dangling find tuple
2025-09-09 15:36:25 -07:00
Nathan Evans
429e1b1f9e
Remove graph embedding and UMAP (#2048)
* Remove umap/layout operation

* Remove graph embedding

* Bump unified-search to GR 2.5.0

* Remove graph vis from unified-search
2025-09-09 15:35:43 -07:00
Copilot
2030f94eb4
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation (#2006)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Initial plan

* Refactor VectorStoreFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in VectorStoreFactory refactoring

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove backward compatibility support from VectorStoreFactory and StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix and ruff format, add semversioner file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update storage factory design

* Refactor CacheFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* revert copilot changes

* fix copilot changes

* update comments

* Fix failing pytest compatibility for factory tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update class instantiation issue

* ruff fixes

* fix pytest

* add default value

* ruff formatting changes

* ruff fixes

* revert minor changes

* cleanup cache factory

* Update CacheFactory tests to match consistent factory pattern

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update pytest thresholds

* adjust threshold levels

* Add custom vector store implementation notebook

Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:

- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist

The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove sample notebook for now

* update tests

* fix cache pytests

* add pandas-stub to dev dependencies

* disable warning check for well known key

* skip tests when running on ubuntu

* add documentation for custom vector store implementations

* ignore ruff findings in notebooks

* fix merge breakages

* speedup CLI import statements

* remove unnecessary import statements in init file

* Add str type option on storage/cache type

* Fix store name

* Add LoggerFactory

* Fix up logging setup across CLI/API

* Add LoggerFactory test

* Fix err message

* Semver

* Remove enums from factory methods

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-08-28 13:53:07 -07:00
Nathan Evans
77fb7d9d7d
Logging improvements (#2030)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Turn down blob/cosmos exception reporting to match file storage

* Restore indexing-engine.log

* Restore some basic console logging and progress for index CLI

* Semver

* Ignore small ruff complaints

* Fix CLI console printing
2025-08-25 14:56:43 -07:00
Copilot
e84df28e64
Improve internal logging functionality by using Python's standard logging module (#1956)
Some checks are pending
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan for issue

* Implement standard logging module and integrate with existing loggers

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Add test cases and improve documentation for standard logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Apply ruff formatting and add semversioner file for logging improvements

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove custom logger classes and refactor to use standard logging only

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Apply ruff formatting to resolve CI/CD test failures

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Add semversioner file and fix linting issues

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff fixes

* fix spelling error

* Remove StandardProgressLogger and refactor to use standard logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove LoggerFactory and custom loggers, refactor to use standard logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix pyright error: use logger.info() instead of calling logger as function in cosmosdb_pipeline_storage.py

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff fixes

* Remove deprecated logger files that were marked as deprecated placeholders

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Replace custom get_logger with standard Python logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues found by ruff check --fix

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff check fixes

* add word to dictionary

* Fix type checker error in ModelManager.__new__ method

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Refactor multiple logging.getLogger() calls to use single logger per file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove progress_logger parameter from build_index() and logger parameter from generate_indexing_prompts()

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove logger parameter from run_pipeline and standardize logger naming

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Replace logger parameter with log_level parameter in CLI commands

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix import ordering in notebook files to pass poetry poe check

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove --logger parameter from smoke test command

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix Windows CI/CD issue with log file cleanup in tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Add StreamHandler to root logger in __main__.py for CLI logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Only add StreamHandler if root logger doesn't have existing StreamHandler

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix import ordering in notebook files to pass ruff checks

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Replace logging.StreamHandler with colorlog.StreamHandler for colorized log output

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Regenerate poetry.lock file after adding colorlog dependency

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix import ordering in notebook files to pass ruff checks

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* move printing of dataframes to debug level

* remove colorlog for now

* Refactor workflow callbacks to inherit from logging.Handler

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in workflow callback handlers

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix pyright type errors in blob and file workflow callbacks

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Refactor pipeline logging to use pure logging.Handler subclasses

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Rename workflow callback classes to workflow logger classes and move to logger directory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update dictionary

* apply ruff fixes

* fix function name

* simplify logger code

* update

* Remove error, warning, and log methods from WorkflowCallbacks and replace with standard logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff fixes

* Fix pyright errors by removing WorkflowCallbacks from strategy type signatures

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove ConsoleWorkflowLogger and apply consistent formatter to all handlers

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff fixes

* Refactor pipeline_logger.py to use standard FileHandler and remove FileWorkflowLogger

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove conditional azure import checks from blob_workflow_logger.py

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix pyright type checking errors in mock_provider.py and utils.py

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix to fix import ordering in notebooks

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Merge configure_logging and create_pipeline_logger into init_loggers function

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove configure_logging and create_pipeline_logger functions, replace all usage with init_loggers

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff fixes

* cleanup unused code

* Update init_loggers to accept GraphRagConfig instead of ReportingConfig

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff check fixes

* Fix test failures by providing valid GraphRagConfig with required model configurations

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff fixes

* remove logging_workflow_callback

* cleanup logging messages

* Add logging to track progress of pandas DataFrame apply operation in create_base_text_units

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* cleanup logger logic throughout codebase

* update

* more cleanup of old loggers

* small logger cleanup

* final code cleanup and added loggers to query

* add verbose logging to query

* minor code cleanup

* Fix broken unit tests for chunk_text and standard_logging

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* apply ruff fixes

* Fix test_chunk_text by mocking progress_ticker function instead of ProgressTicker class

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove unnecessary logger

* remove rich and fix type annotation

* revert test formatting changes my by copilot

* promote graphrag logs to root logger

* add correct semversioner file

* revert change to file

* revert formatting changes that have no effect

* fix changes after merge with main

* revert unnecessary copilot changes

* remove whitespace

* cleanup docstring

* simplify some logic with less code

* update poetry lock file

* ruff fixes

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-09 18:29:03 -06:00
Nathan Evans
1df89727c3
Pipeline registration (#1940)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Move covariate run conditional

* All pipeline registration

* Fix method name construction

* Rename context storage -> output_storage

* Rename OutputConfig as generic StorageConfig

* Reuse Storage model under InputConfig

* Move input storage creation out of document loading

* Move document loading into workflows

* Semver

* Fix smoke test config for new workflows

* Fix unit tests

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-06-12 16:14:39 -07:00
Nathan Evans
36948b8d2e
Various minor updates (#1932)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Add text unit ids to Community model

* Add graph utilities

* Turn off LCC for clustering by default

* Simplify embeddings config/flow

* Semver
2025-05-16 14:48:53 -07:00
Nathan Evans
e2a448170a
Fix/minor query fixes (#1893)
* fixed token count for drift search

* basic search fixes

* updated basic search prompt

* fixed text splitting logic

* Lint/format

* Semver

* Fix text splitting tests

---------

Co-authored-by: ha2trinh <trinhha@microsoft.com>
2025-04-25 14:12:18 -07:00
Nathan Evans
ad4cdd685f
Support OpenAI reasoning models (#1841)
Some checks failed
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Update tiktoken

* Add max_completion_tokens to model config

* Update/remove outdated comments

* Remove max_tokens from report generation

* Remove max_tokens from entity summarization

* Remove logit_bias from graph extraction

* Remove logit_bias from claim extraction

* Swap params if reasoning model

* Add reasoning model support to basic search

* Add reasoning model support for local and global search

* Support reasoning models with dynamic community selection

* Support reasoning models in DRIFT search

* Remove unused num_threads entry

* Semver

* Update openai

* Add reasoning_effort param
2025-04-22 14:15:26 -07:00
Nathan Evans
66c2cfb3ce
Support JSON input files (#1777)
* Add csv loader tests

* Add test loader tests

* Add json input support

* Remove temp path constraint

* Reuse loader cose

* Semver

* Set file pattern automatically based on type, if empty

* Remove pattern from smoke test config

* Spelling

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-10 14:04:07 -07:00
Nathan Evans
ede6a74546
Pipeline callbacks (#1729)
* Add pipeline_start and pipeline_end callbacks

* Collapse redundant callback/logger logic

* Remove redundant reporting config classes

* Remove a few out-of-date type ignores

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 15:07:51 -08:00
Alonso Guevara
e0d233fe10
Feat/llm provider query (#1735)
* Add ModelProvider to Query package.

* Spellcheck + others

* Semver

* Fix tests

* Format

* Fix Pyright

* Fix tests

* Fix for smoke tests
2025-02-24 18:35:51 -06:00
Derek Worthen
54885b8ab1
Refactor config defaults (#1723)
* Refactor config defaults

- Implement type-safe, hierarchical dataclass for config
defaults instead of namespaced constants.
- Allow for instantiating config directly from defaults data structure.

* fix vector_store db_uri default

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-20 13:01:29 -06:00
Alonso Guevara
7bdeaee94a
Create Language Model Providers and Registry methods. Remove fnllm coupling (#1724)
* Base structure

* Add fnllm providers and Mock LLM

* Remove fnllm coupling, introduce llm providers

* Ruff + Tests fix

* Spellcheck

* Semver

* Format

* Default MockChat params

* Fix more tests

* Fix embedding smoke test

* Fix embeddings smoke test

* Fix MockEmbeddingLLM

* Rename LLM to model. Package organization

* Fix prompt tuning

* Oops

* Oops II
2025-02-20 08:56:20 -06:00
Nathan Evans
35b639399b
Incremental flow rework (#1696)
* Rework update output structure

* Semver

* Fix unit test

* Update frequency in incremental

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-13 18:22:32 -06:00
Josh Bradley
f14cda2b6d
Improve default llm retry logic to be more optimized (#1701) 2025-02-13 16:56:37 -05:00
Nathan Evans
fe461417b5
Export NLP community reports prompt (#1697)
* Properly export the NLP community reports prompt

* Semver

* Fix verb tests
2025-02-12 10:41:39 -08:00
Dayenne Souza
b94290ec2b
add option to add metadata into text chunks (#1681)
* add new options

* add metadata json into input document

* remove doc change

* add metadata column into text loader

* prepend_metadata

* run fix

* fix tests and patch

* fix test

* add watrning for metadata tokens > config size

* fix typo and run fix

* fix test_integration

* fix test

* run check

* rename and fix chunking

* fix

* fix

* fiz test verbs

* fix

* fix tests

* fix chunking

* fix index

* fix cosmos test

* fix vars

* fix after PR

* fix
2025-02-12 09:38:03 -08:00
Nathan Evans
c02ab0984a
Streamline workflows (#1674)
* Remove create_final_nodes

* Rename final entity output to "entities"

* Remove duplicate code from graph extraction

* Rename create_final_relationships output to "relationships"

* Rename create_final_communities output to "communities"

* Combine compute_communities and create_final_communities

* Rename create_final_covariates output to "covariates"

* Rename create_final_community_reports output to "community_reports"

* Rename create_final_text_units output to "text_units"

* Rename create_final_documents output to "documents"

* Remove transient snapshots config

* Move create_final_entities to finalize_entities operation

* Move create_final_relationships flow to finalize_relationships operation

* Reuse some community report functions

* Collapse most of graph and text unit-based report generation

* Unify schemas files

* Move community reports extractor

* Move NLP report prompt to prompts folder

* Fix a few pandas warnings

* Rename embeddings config to embed_text

* Rename claim_extraction config to extract_claims

* Remove nltk from standard graph extraction

* Fix verb tests

* Fix extract graph config naming

* Fix moved file reference

* Create v1-to-v2 migration notebook

* Semver

* Fix smoke test artifact count

* Raise tpm/rpm on smoke tests

* Update drift settings for smoke tests

* Reuse project directory var in api notebook

* Format

* Format
2025-02-07 11:11:03 -08:00
KennyZhang1
83cc2daf91
Multi-index query CLI support (#1675)
* Add vector store id reference to embeddings config.

* changed structure of output config section

* added cli integration for multi index global

* added cli integration for multi index local

* added cli integration for multi index drift and basic

* finished local testing of multi-index cli

* ruff fixes

* partially refactored test code to align with new output section

* more test changes for new output structure

* semversioner

* refactored to align with new multi index config proposal

* locally tested new multi-index output proposal

* cleaned up tests to align with new structure

---------

Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
2025-02-07 12:56:48 -05:00
Dayenne Souza
ad5b5120ec
remove unused columns and rename document_attribute_columns (#1672)
* remove unused columns and change property document_attribute_columns to metadata

* format file

* fix 'metadata' column on output

* run check

* fix test on nltk

* remove docs changes
2025-02-03 14:37:06 -03:00
Derek Worthen
94bd2bb816
Require explicit azure auth settings when using AOI. (#1665)
* Require explicit azure auth settings when using AOI.

- Must set LanguageModel.azure_auth_type to either
"api_key" or "managed_identity" when using AOI.

* Fix smoke tests

* Use general auth_type property instead of azure_auth_type

* Remove unused error type

* Update validation

* Update validation comment
2025-01-29 12:28:47 -08:00
Derek Worthen
eeee84e9d9
Add vector store id reference to embeddings config. (#1662) 2025-01-28 10:46:41 -08:00
KennyZhang1
1bbce33f42
Multi-index querying for API layer (#1644)
* added multi-global-query function header

* ported over code for merging dataframes

* added connection to global streaming api function

* added function header for update context helper

* implemented and incorperated update_context function

* Updated to make sure 'parent' column in final_communities gets incremented for multi index.

* first cut at multi_local_seach function

* several minor changes and fixes

* Updated multi index local search.

* Cleaned up code.

* fixed lambda function ruff errors

* fixed more ruff errors

* moved query api helpers to util file

* moved index api helpers to util file

* merged in code left out of conflict

* changed GraphRagConfig object to support lists of vector stores

* Updated with fixes for multi_local_search.

* Minor updates.

* Minor updates.

* Updates for ruff check.

* Minor updates.

* removed redundant vector_store_configs arg

* ruff formatting changes

* semversioner

* Minor fix.

* spellcheck fixes

* ruff

* test fix for cicd errors

* another test fix

* added explicit typing for ci tests

* added dict type check for vector_store during indexing

* more ruff fixes

* moved type check

* Removed streaming. Added multi drift and basic searches.

* Formatting changes.

* Updates for pyright.

* Update for ruff.

* Ruff formatted.

* first cut at fixing vector store typing errors

* got multi local search working with new config

* ruff and test fixes

* added fix for embeddings type error

* renamed multi index api functions

* ruff

* convert config model to dict[VectorStoreConfig]

* modified tests to support new vector_store model

* ruff fixes

* changed some test setups to match new model

* changed ci/cd settings files to match new structure

* Fix stderror check

* fixed bug in vector_store_config validation

* ruff

* add database_name field to vectorstoreconfig

* removed print statements

* small refactoring for PR comments

* modified default config in test

* modified vector store config unit test

---------

Co-authored-by: dorbaker <dorbaker@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-27 17:26:38 -05:00
Derek Worthen
c644338bae
Refactor config (#1593)
* Refactor config

- Add new ModelConfig to represent LLM settings
    - Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
    - Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
    - This removes the automatic env variable overrides
- Support env variables within config files using Templating
    - This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure

* Use dict of ModelConfig instead of list

* Add model validations and unit tests

* Fix ruff checks

* Add semversioner change

* Fix unit tests

* validate root_dir in pydantic model

* Rename ModelConfig to LanguageModelConfig

* Rename ModelConfigMissingError to LanguageModelConfigMissingError

* Add validationg for unexpected API keys

* Allow skipping pydantic validation for testing/mocking purposes.

* Add default lm configs to verb tests

* smoke test

* remove config from flows to fix llm arg mapping

* Fix embedding llm arg mapping

* Remove timestamp from smoke test outputs

* Remove unused "subworkflows" smoke test properties

* Add models to smoke test configs

* Update smoke test output path

* Send logs to logs folder

* Fix output path

* Fix csv test file pattern

* Update placeholder

* Format

* Instantiate default model configs

* Fix unit tests for config defaults

* Fix migration notebook

* Remove create_pipeline_config

* Remove several unused config models

* Remove indexing embedding and input configs

* Move embeddings function to config

* Remove skip_workflows

* Remove skip embeddings in favor of explicit naming

* fix unit test spelling mistake

* self.models[model_id] is already a language model. Remove redundant casting.

* update validation errors to instruct users to rerun graphrag init

* instantiate LanguageModelConfigs with validation

* skip validation in unit tests

* update verb tests to use default model settings instead of skipping validation

* test using llm settings

* cleanup verb tests

* remove unsafe default model config

* remove the ability to skip pydantic validation

* remove None union types when default values are set

* move vector_store from embeddings to top level of config and delete resolve_paths

* update vector store settings

* fix vector store and smoke tests

* fix serializing vector_store settings

* fix vector_store usage

* fix vector_store type

* support cli overrides for loading graphrag config

* rename storage to output

* Add --force flag to init

* Remove run_id and resume, fix Drift config assignment

* Ruff

---------

Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-21 17:52:06 -06:00
Dayenne Souza
2f2cfa7b70
Test and unify text splitter functionality (#1547)
* add text_splitting unit test

* change folder test text splitting

* fix chunk fn

* test new function

* run formatter

* run spell check

* run semver

* remove tiktoken mocked from tests

* change progress ticker

* fix ruff check
2025-01-13 18:42:44 -03:00
Nathan Evans
a35cb12741
Remove datashaper strip code (#1581)
Remove datashaper
2025-01-03 13:59:26 -08:00
Derek Worthen
80367be018
Remove config input models (#1570)
* Remove config input models

* remove unit tests related to config input models

* add semversioner change

* Merge branch 'main' into config-remove-input-models
2025-01-02 15:25:10 -08:00
gaudyb
185f513ca7
Basic search implementation (#1563)
* basic search implementation

* basic streaming functionality

* format check

* check fix

* release change

* Chore/gleanings any encoding (#1569)

* Make claims and entities independent of encoding

* Semver

* Change semver release type

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 13:49:11 -06:00
Nathan Evans
a2647da473
Simplify flow config (#1554)
* Flatten compute_communities config

* Remove cluster strategy type

* Flatten create_base_text_units config

* Move cluster seed to config default, leave as None in functions

* Remove "prechunked" logic

* Remove hard-coded encoding model

* Remove unused variables

* Strongly type embed_config

* Simplify layout_graph config

* Semver

* Fix integration test

* Fix config unit tests: ignore new config defaults

* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
KennyZhang1
8368b12532
Add Cosmos DB storage/cache option (#1431)
* added cosmosdb constructor and database methods

* added rest of abstract method headers

* added cosmos db container methods

* implemented has and delete methods

* finished implementing abstract class methods

* integrated class into storage factory

* integrated cosmosdb class into cache factory

* added support for new config file fields

* replaced primary key cosmosdb initialization with connection strings

* modified cosmosdb setter to require json

* Fix non-default emitters

* Format

* Ruff

* ruff

* first successful run of cosmosdb indexing

* removed extraneous container_name setting

* require base_dir to be typed as str

* reverted merged changed from closed branch

* removed nested try statement

* readded initial non-parquet emitter fix

* added basic support for parquet emitter using internal conversions

* merged with main and resolved conflicts

* fixed more merge conflicts

* added cosmosdb functionality to query pipeline

* tested query for cosmosdb

* collapsed cosmosdb schema to use minimal containers and databases

* simplified create_database and create_container functions

* ruff fixes and semversioner

* spellcheck and ci fixes

* updated pyproject toml and lock file

* apply fixes after merge from main

* add temporary comments

* refactor cache factory

* refactored storage factory

* minor formatting

* update dictionary

* fix spellcheck typo

* fix default value

* fix pydantic model defaults

* update pydantic models

* fix init_content

* cleanup how factory passes parameters to file storage

* remove unnecessary output file type

* update pydantic model

* cleanup code

* implemented clear method

* fix merge from main

* add test stub for cosmosdb

* regenerate lock file

* modified set method to collapse parquet rows

* modified get method to collapse parquet rows

* updated has and delete methods and docstrings to adhere to new schema

* added prefix helper function

* replaced delimiter for prefixed id

* verified empty tests are passing

* fix merges from main

* add find test

* update cicd step name

* tested querying for new schema

* resolved errors from merge conflicts

* refactored set method to handle cache in new schema

* refactored get method to handle cache in new schema

* force unique ids to be written to cosmos for nodes

* found bug with has and delete methods

* modified has and delete to work with cache in new schema

* fix the merge from main

* minor typo fixes

* update lock file

* spellcheck fix

* fix init function signature

* minor formatting updates

* remove https protocol

* change localhost to 127.0.0.1 address

* update pytest to use bacj engine

* verified cache tests

* improved speed of has function

* resolved pytest error with find function

* added test for child method

* make container_name variable private as _container_name

* minor variable name fix

* cleanup cosmos pytest and make the cosmosdb storage class operations more efficient

* update cicd to use different cosmosdb emulator

* test with http protocol

* added pytest for clear()

* add longer timeout for cosmosdb emulator startup

* revert http connection back to https

* add comments to cicd code for future dev usage

* set to container and database clients to none upon deletion

* ruff changes

* add comments to cicd code

* removed unneeded None statements and ruff fixes

* more ruff fixes

* Update test_run.py

* remove unnecessary call to delete container

* ruff format updates

* Reverted test_run.py

* fix ruff formatter errors

* cleanup variable names to be more consistent

* remove extra semversioner file

* revert pydantic model changes

* revert pydantic model change

* revert pydantic model change

* re-enable inline formatting rule

* update documentation in dev guide

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:43:21 -06:00
Nathan Evans
c1c09bab80
Flow cleanup (#1510)
* Move snapshots out of flows into verbs

* Move degree compute out of extract_graph

* Move entity/relationship df merging into extract

* Move "title" to extraction source

* Move text_unit_ids agg closer to extraction

* Move data definition

* Update test data

* Semver

* Update smoke tests

* Fix empty degree field and update smoke tests and verb data

* Move extractors (#1516)

* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Fix syntax typo

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 18:07:44 -08:00
Nathan Evans
d0543d1fd6
Move extractors (#1516)
* Consolidate graph embedding and umap

* Consolidate claim extraction

* Consolidate graph extractor

* Move graph utils

* Move summarizers

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 16:21:41 -08:00
Alonso Guevara
1c3b0f34c3
Chore/lib updates (#1477)
* Update dependencies and fix issues

* Format

* Semver

* Fix Pyright

* Pyright

* More Pyright

* Pyright
2024-12-06 14:08:24 -06:00
Chris Trevino
5ff2d3c76d
Remove graphrag.llm, replace with fnllm (#1315)
* add fnllm; remove llm folder

* remove llm unit tests

* update imports

* update imports

* formatting

* enable autosave

* update mockllm

* update community reports extractor

* move most llm usage to fnllm

* update type issues

* fix unit tests

* type updates

* update dictionary

* semver

* update llm construction, get integration tests working

* load from llmparameters model

* move ruff settings to ruff.toml

* add gitattributes file

* ignore ruff.toml spelling

* update .gitattributes

* update gitignore

* update config construction

* update prompt var usage

* add cache adapter

* use cache adapter in embeddings calls

* update embedding strategy

* add fnllm

* add pytest-dotenv

* fix some verb tests

* get verbtests running

* update ruff.toml for vscode

* enable ruff native server in vscode

* update artifact inspecting code

* remove local-test update

* use string.replace instead of string.format in community reprots etxractor

* bump timeout

* revert ruff.toml, vscode settings for another pr

* revert cspell config

* revert gitignore

* remove json-repair, update fnllm

* use fnllm generic type interfaces

* update load_llm to use target models

* consolidate chat parameters

* add 'extra_attributes' prop to community report response

* formatting

* update fnllm

* formatting

* formatting

* Add defaults to some llm params to avoid null on params hash

* Formatting

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 18:07:47 -06:00
Nathan Evans
d17dfd01f9
Graph collapse (#1464)
* Refactor graph creation

* Semver

* Spellcheck

* Update integ pipeline

* Fix cast

* Improve pandas chaining

* Cleaner apply

* Use list comprehensions

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-05 11:57:26 -06:00
Josh Bradley
dad2176b3c
Miscellaneous code cleanup procedures (#1452) 2024-11-27 13:27:43 -05:00
Josh Bradley
22a57d14c7
Improve CLI speed with lazy imports (#1319) 2024-11-15 19:41:10 -05:00