Commit Graph

333 Commits

Author SHA1 Message Date
Dayenne Souza
eff3588592 fix 2025-01-30 22:55:11 -03:00
Dayenne Souza
3dcdd3ce53 Merge remote-tracking branch 'origin/main' into feat/metadata 2025-01-30 17:12:14 -03:00
Dayenne Souza
574736c825 add metadata to text 2025-01-30 17:12:01 -03:00
Nathan Evans
907d271f4e
Fix recursive report generation (#1669) 2025-01-30 11:03:25 -08:00
Dayenne Souza
ad7144eb63 add metadata into token count 2025-01-29 23:27:52 -03:00
Nathan Evans
53b06aa2ac
Add generate_text_embeddings to FGR (#1667) 2025-01-29 14:31:48 -08:00
Derek Worthen
94bd2bb816
Require explicit azure auth settings when using AOI. (#1665)
* Require explicit azure auth settings when using AOI.

- Must set LanguageModel.azure_auth_type to either
"api_key" or "managed_identity" when using AOI.

* Fix smoke tests

* Use general auth_type property instead of azure_auth_type

* Remove unused error type

* Update validation

* Update validation comment
2025-01-29 12:28:47 -08:00
Dayenne Souza
e0183509c9 add semver 2025-01-29 14:00:07 -03:00
Dayenne Souza
598ec9f434 Merge remote-tracking branch 'origin/main' into feat/metadata 2025-01-29 13:58:58 -03:00
Dayenne Souza
7f13c911a1 add metadata_columns 2025-01-29 13:58:44 -03:00
Nathan Evans
d31750f44d
NLP graph extraction (#1652)
* Add NLP extraction workflow

* Add text unit community summarization

* Add CLI flag for indexing method

* Regenerate poetry.lock

* Fix claims loading

* Merge fixes

* Add workflow overrides to config

* Semver

* Add graph pruning config

* Remove degree re-compute from pruning

* Switch to percentile for edge weight pruning

* Add NLP extraction config

* Add new NLP extractor options

* Add FGR workflows to util method

* Use a generator factory for workflows

* Update pruning defaults

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-28 12:27:03 -08:00
Derek Worthen
eeee84e9d9
Add vector store id reference to embeddings config. (#1662) 2025-01-28 10:46:41 -08:00
KennyZhang1
1bbce33f42
Multi-index querying for API layer (#1644)
* added multi-global-query function header

* ported over code for merging dataframes

* added connection to global streaming api function

* added function header for update context helper

* implemented and incorperated update_context function

* Updated to make sure 'parent' column in final_communities gets incremented for multi index.

* first cut at multi_local_seach function

* several minor changes and fixes

* Updated multi index local search.

* Cleaned up code.

* fixed lambda function ruff errors

* fixed more ruff errors

* moved query api helpers to util file

* moved index api helpers to util file

* merged in code left out of conflict

* changed GraphRagConfig object to support lists of vector stores

* Updated with fixes for multi_local_search.

* Minor updates.

* Minor updates.

* Updates for ruff check.

* Minor updates.

* removed redundant vector_store_configs arg

* ruff formatting changes

* semversioner

* Minor fix.

* spellcheck fixes

* ruff

* test fix for cicd errors

* another test fix

* added explicit typing for ci tests

* added dict type check for vector_store during indexing

* more ruff fixes

* moved type check

* Removed streaming. Added multi drift and basic searches.

* Formatting changes.

* Updates for pyright.

* Update for ruff.

* Ruff formatted.

* first cut at fixing vector store typing errors

* got multi local search working with new config

* ruff and test fixes

* added fix for embeddings type error

* renamed multi index api functions

* ruff

* convert config model to dict[VectorStoreConfig]

* modified tests to support new vector_store model

* ruff fixes

* changed some test setups to match new model

* changed ci/cd settings files to match new structure

* Fix stderror check

* fixed bug in vector_store_config validation

* ruff

* add database_name field to vectorstoreconfig

* removed print statements

* small refactoring for PR comments

* modified default config in test

* modified vector store config unit test

---------

Co-authored-by: dorbaker <dorbaker@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-27 17:26:38 -05:00
Shamik
053bf60162
Update auto_prompt_tuning.md (#1659)
Updated the auto prompt tuning doc with `--selection-method` instead of only `--method` as per the latest API.

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-27 13:33:25 -06:00
Dayenne Souza
e55d981fa9 Merge remote-tracking branch 'origin/main' into feat/metadata 2025-01-24 16:38:08 -03:00
Alonso Guevara
6b33977360
Add smoke tests for drift (#1658) 2025-01-24 12:31:37 -06:00
Derek Worthen
c644338bae
Refactor config (#1593)
* Refactor config

- Add new ModelConfig to represent LLM settings
    - Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode
- Add top level models config that is a list of available LLM ModelConfigs
- Remove LLMConfig inheritance and delete LLMConfig
    - Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config
- Remove all fallbacks and hydration logic from create_graphrag_config
    - This removes the automatic env variable overrides
- Support env variables within config files using Templating
    - This requires "$" to be escaped with extra "$" so ".*\\.txt$" becomes ".*\\.txt$$"
- Update init content to initialize new config file with the ModelConfig structure

* Use dict of ModelConfig instead of list

* Add model validations and unit tests

* Fix ruff checks

* Add semversioner change

* Fix unit tests

* validate root_dir in pydantic model

* Rename ModelConfig to LanguageModelConfig

* Rename ModelConfigMissingError to LanguageModelConfigMissingError

* Add validationg for unexpected API keys

* Allow skipping pydantic validation for testing/mocking purposes.

* Add default lm configs to verb tests

* smoke test

* remove config from flows to fix llm arg mapping

* Fix embedding llm arg mapping

* Remove timestamp from smoke test outputs

* Remove unused "subworkflows" smoke test properties

* Add models to smoke test configs

* Update smoke test output path

* Send logs to logs folder

* Fix output path

* Fix csv test file pattern

* Update placeholder

* Format

* Instantiate default model configs

* Fix unit tests for config defaults

* Fix migration notebook

* Remove create_pipeline_config

* Remove several unused config models

* Remove indexing embedding and input configs

* Move embeddings function to config

* Remove skip_workflows

* Remove skip embeddings in favor of explicit naming

* fix unit test spelling mistake

* self.models[model_id] is already a language model. Remove redundant casting.

* update validation errors to instruct users to rerun graphrag init

* instantiate LanguageModelConfigs with validation

* skip validation in unit tests

* update verb tests to use default model settings instead of skipping validation

* test using llm settings

* cleanup verb tests

* remove unsafe default model config

* remove the ability to skip pydantic validation

* remove None union types when default values are set

* move vector_store from embeddings to top level of config and delete resolve_paths

* update vector store settings

* fix vector store and smoke tests

* fix serializing vector_store settings

* fix vector_store usage

* fix vector_store type

* support cli overrides for loading graphrag config

* rename storage to output

* Add --force flag to init

* Remove run_id and resume, fix Drift config assignment

* Ruff

---------

Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-21 17:52:06 -06:00
Nathan Evans
47adfe16f0
Fix DRIFT search on Azure AI Search (#1645)
* Add vector field to retrievable fields for Azure AI Search

* Add DRIFT and Basic search to smoke tests

* Semver

* Format

* Remove DRIFT smoke test for now (brittle)
2025-01-21 17:28:46 -06:00
Dayenne Souza
2f60523857 merge with main and rename column document_attribute_columns to metadata 2025-01-21 12:07:25 -03:00
Dayenne Souza
c90fab9e6d Merge remote-tracking branch 'origin/main' into feat/metadata 2025-01-21 11:43:55 -03:00
Alonso Guevara
dd884c0ce2
Release v1.2.0 (#1625) 2025-01-15 15:49:07 -06:00
Alonso Guevara
3defab2ea4
Reduce Drift Response and Streaming endpoint (#1624)
* Adding basic wrappes for reduce in drift

* Add response_type parameter to run_drift_search and enhance reduce response functionality

* Add streaming endpoint

* Semver

* Spellcheck

* Ruff checks

* Count tokens on reduce

* Use list comprehension and remove llm_params map in favor of just using kwargs
2025-01-15 14:23:25 -06:00
Dayenne Souza
16ea499349 run formatter 2025-01-14 12:18:04 -03:00
Dayenne Souza
8356607ae5 remove title_column from input 2025-01-14 12:16:35 -03:00
Dayenne Souza
6a7b636151 remove timestamp_format from input 2025-01-14 12:10:04 -03:00
Dayenne Souza
48be98d5c6 remove timestamp_column from input 2025-01-14 12:07:44 -03:00
Dayenne Souza
ca8f3f8def remove source_column from input 2025-01-14 12:06:31 -03:00
Dayenne Souza
dbc7ad9746 remove source_column from input 2025-01-14 12:06:23 -03:00
KennyZhang1
4637270e9a
Implement CosmosDB vector store (#1587) 2025-01-14 02:47:08 -05:00
Alonso Guevara
e21a38f2ab
Fix/notebooks (#1614)
* Add new inputs and missing vector store for retrieving vectors

* Format

* Semver

* Remove .Identifier files

* Fix spellcheck

* Remove unnecessary input file for notebooks
2025-01-13 17:41:39 -06:00
Dayenne Souza
2f2cfa7b70
Test and unify text splitter functionality (#1547)
* add text_splitting unit test

* change folder test text splitting

* fix chunk fn

* test new function

* run formatter

* run spell check

* run semver

* remove tiktoken mocked from tests

* change progress ticker

* fix ruff check
2025-01-13 18:42:44 -03:00
Nathan Evans
0e7d22bfb0
Jan documentation updates (#1612)
* Update workflow docs

* Docs cleanup
2025-01-10 11:36:27 -08:00
Nathan Evans
63042d22f3
Limiter defaults (#1611)
* Edit rate limit defaults

* Semver
2025-01-10 10:09:12 -08:00
Alonso Guevara
e69abc7f5d
Release/v1.1.2 (#1607)
* Release v1.1.2

* Change from minor to patch
2025-01-09 16:50:04 -06:00
gaudyb
37fd7a7762
fix basic search minor bug (#1606)
* fix basic search minor bug

* version change

---------

Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>
2025-01-09 14:48:01 -06:00
Alonso Guevara
2682c7102f
Release v1.1.1 (#1595) 2025-01-08 16:18:39 -06:00
Alonso Guevara
368acc18c1
Fix/dynamic search hierarchy maps (#1591)
* Fix community hierarchy maps creation

* Semver
2025-01-08 15:40:26 -06:00
Alonso Guevara
6eca5ec69f
Chore/increase search community prop def (#1589)
* Increase LOCAL_SEARCH_COMMUNITY_PROP

* Semver
2025-01-08 09:33:36 -06:00
Alonso Guevara
f000309829
Release v1.1.0 (#1588) 2025-01-07 16:16:17 -06:00
Nathan Evans
7ec9ef0261
Refactor callbacks (#1583)
* Unify Workflow and Verb callbacks interfaces

* Semver

* Fix storage class instantiation (#1582)

---------

Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-01-06 10:58:59 -08:00
Josh Bradley
cbb8f8788e
Fix storage class instantiation (#1582) 2025-01-03 17:39:44 -05:00
Nathan Evans
a35cb12741
Remove datashaper strip code (#1581)
Remove datashaper
2025-01-03 13:59:26 -08:00
dependabot[bot]
58f646a019
Bump ruff from 0.8.4 to 0.8.5 (#1579)
* Bump ruff from 0.8.4 to 0.8.5

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.5.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.8.4...0.8.5)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix ruff

* Semver

* Another ruff

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 17:45:52 -06:00
Derek Worthen
80367be018
Remove config input models (#1570)
* Remove config input models

* remove unit tests related to config input models

* add semversioner change

* Merge branch 'main' into config-remove-input-models
2025-01-02 15:25:10 -08:00
gaudyb
185f513ca7
Basic search implementation (#1563)
* basic search implementation

* basic streaming functionality

* format check

* check fix

* release change

* Chore/gleanings any encoding (#1569)

* Make claims and entities independent of encoding

* Semver

* Change semver release type

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-02 13:49:11 -06:00
Alonso Guevara
5f9ad0d003
Chore/gleanings any encoding (#1569)
* Make claims and entities independent of encoding

* Semver

* Change semver release type
2025-01-02 11:44:21 -06:00
Alonso Guevara
2abd6c5f5c
Update blog posts (#1571) 2024-12-30 17:16:08 -06:00
Alonso Guevara
5258bc5f4f
Fix/gleanings loop (#1564)
* Fix gleaning output parsing

* Semver
2024-12-30 12:57:33 -06:00
Nathan Evans
a2647da473
Simplify flow config (#1554)
* Flatten compute_communities config

* Remove cluster strategy type

* Flatten create_base_text_units config

* Move cluster seed to config default, leave as None in functions

* Remove "prechunked" logic

* Remove hard-coded encoding model

* Remove unused variables

* Strongly type embed_config

* Simplify layout_graph config

* Semver

* Fix integration test

* Fix config unit tests: ignore new config defaults

* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
Theo Beigbeder
e6de713f25
Fix in load_llm.py (#1508)
Fixed an issue where the "proxy" setting was passed to the PublicOpenAPI constructor instead of the  "api_base" parameter, disabling the use of on-premise OpenAI-based LLM servers

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-19 13:51:01 -06:00