Gabriel Nieves
be21d994c0
init
2025-03-26 16:25:08 +00:00
Alonso Guevara
b7b2b562ce
fnllm version fix ( #1835 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Fix fnllm version
* Semver
2025-03-21 22:13:56 -07:00
Nathan Evans
3b1e70c06b
Update config docs (2.1.0) ( #1818 )
...
gh-pages / build (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled
Spellcheck / spellcheck (push) Has been cancelled
* Align docs with config
* Semver
* Spelling
* Format
* Spelling
2025-03-18 12:39:30 -07:00
Nathan Evans
813b4de99f
Fix API key reference for gh-pages ( #1821 )
2025-03-18 11:10:11 -07:00
Nathan Evans
ddc6541ab6
Add docs page about input formats ( #1784 )
...
* Add docs page about input formats
* Add json example
* Spelling
2025-03-11 17:37:46 -07:00
Nathan Evans
321d479ab6
Update notebooks for 2.0 ( #1785 )
...
* Update API overview
* Fix global search example
* Fix local search example
* Fix global dynamic example
* Fix drift example
* Update multi-index example
* Semver
2025-03-11 17:23:49 -07:00
Alonso Guevara
0d363e6957
Release v2.1.0 ( #1800 )
2025-03-11 18:16:08 -06:00
Alonso Guevara
53950f8442
Fix/model provider key injection check ( #1799 )
...
* Check available models for type validation
* Semver
* Fix ruff and pyright
* Apply feedback
2025-03-11 17:48:30 -06:00
Gabriel Nieves-Ponce
e39d869bed
Added support for verbose logging and csv-metadata to the prompt tune… ( #1789 )
...
* Added support for verbose logging and csv-metadata to the prompt tune client.
* Updated community report summarization file name and prompt template
* updated semversioner
* ran ruff linter
* Ran poe format
* Fix Ruff complains
* Fix a new ruff complain :P
* Pyright
* Fix tests
---------
Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-11 14:55:02 -06:00
Nathan Evans
66c2cfb3ce
Support JSON input files ( #1777 )
...
* Add csv loader tests
* Add test loader tests
* Add json input support
* Remove temp path constraint
* Reuse loader cose
* Semver
* Set file pattern automatically based on type, if empty
* Remove pattern from smoke test config
* Spelling
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-03-10 14:04:07 -07:00
Nathan Evans
bcb74789f1
Next release docs ( #1627 )
...
* Wordind updates
* Update yam lconfig and add notes to "deprecated" env
* Add basic search section
* Update versioning docs
* Minor edits for clarity
* Update init command
* Update init to add --force in docs
* Add NLP extraction params
* Move vector_store to root
* Add workflows to config
* Add FastGraphRAG docs
* add metadata column changes
* Added documentation for multi index search.
* Minor fixes.
* Add config and table renames
* Update migration notebook and comments to specify v1
* Add frequency to entity table docs
* add new chunking options for metadata
* Update output docs
* Minor edits and cleanup
* Add model ids to search configs
* Spruce up migration notebook
* Lint/format multi-index notebook
* SpaCy model note
* Update SpaCy footnote
* Updated multi_index_search.ipynb to remove ruff errors.
* add spacy to dictionary
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Dayenne Souza <ddesouza@microsoft.com>
Co-authored-by: dorbaker <dorbaker@microsoft.com>
2025-03-03 14:46:00 -08:00
Nathan Evans
bd06d8b4f0
Context property bag ("state") ( #1774 )
...
* Add pipeline state property bag to run context
* Move state creation out of context util
* Move callbacks into PipelineRunContext
* Semver
* Rename state.json to context.json to avoid confusion with stats.json
* Expand smoke test row count
* Add util to create storage and cache
2025-02-28 09:31:48 -08:00
Nathan Evans
a15942629b
Add more verb tests ( #1773 )
...
* Add NLP verb test
* Add finalize_graph tests
* Add more thorough final column assertions
2025-02-27 09:31:46 -08:00
Alonso Guevara
b4b8b81c0a
Remove spacy model from toml ( #1771 )
...
* Remove spacy model from toml
* Semver
2025-02-26 10:58:02 -06:00
Alonso Guevara
716f93dd8b
Release v2.0.0 ( #1769 )
...
* Release v2.0.0
* snspshots...
2025-02-25 17:52:30 -06:00
Alonso Guevara
facf68148a
Fix summarization and relationship grouping on Inc Indexing ( #1768 )
...
* Finx sumarization for large descriptions on incremental indexing
* Semver
* Ruff
2025-02-25 17:29:55 -06:00
Nathan Evans
ede6a74546
Pipeline callbacks ( #1729 )
...
* Add pipeline_start and pipeline_end callbacks
* Collapse redundant callback/logger logic
* Remove redundant reporting config classes
* Remove a few out-of-date type ignores
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 15:07:51 -08:00
Nathan Evans
e40476153d
Speed up smoke tests ( #1736 )
...
* Move verb tests to regular CI
* Clean up env vars
* Update smoke runtime expectations
* Rework artifact assertions
* Fix plural in name
* remove redundant artifact len check
* Remove redundant artifact len check
* Adjust graph output expectations
* Update community expectations
* Include all workflow output
* Adjust text unit expectations
* Adjust assertions per dataset
* Fix test config param name
* Update nan allowed for optional model fields
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 13:24:35 -08:00
Nathan Evans
61a309b182
Incremental model alignment ( #1766 )
...
* Used shared schema lists for all final columns
* Semver
2025-02-25 13:14:42 -06:00
Alonso Guevara
0144b3fd88
Update FNLLM ( #1738 )
...
* Add ModelProvider to Query package.
* Spellcheck + others
* Semver
* Fix tests
* Format
* Fix Pyright
* Fix tests
* Fix for smoke tests
* Update fnllm version
* Semver
* Ruff
2025-02-24 20:30:45 -06:00
Nathan Evans
5dd9fc53cd
Move embeddings snapshots ( #1737 )
...
* Move embedding snapshots to the workflow runner
* Semver
* Rename input tables
2025-02-24 17:38:01 -08:00
Alonso Guevara
e0d233fe10
Feat/llm provider query ( #1735 )
...
* Add ModelProvider to Query package.
* Spellcheck + others
* Semver
* Fix tests
* Format
* Fix Pyright
* Fix tests
* Fix for smoke tests
2025-02-24 18:35:51 -06:00
Nathan Evans
faa05b691f
Fix text unit incremental ID updates ( #1734 )
...
* Increment text_unit ids during incremental
* Semver
2025-02-24 14:58:00 -08:00
Nathan Evans
a932b2d342
Fix StopAsyncIteration catch ( #1730 )
2025-02-21 11:46:44 -08:00
Derek Worthen
54885b8ab1
Refactor config defaults ( #1723 )
...
* Refactor config defaults
- Implement type-safe, hierarchical dataclass for config
defaults instead of namespaced constants.
- Allow for instantiating config directly from defaults data structure.
* fix vector_store db_uri default
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-20 13:01:29 -06:00
Alonso Guevara
7bdeaee94a
Create Language Model Providers and Registry methods. Remove fnllm coupling ( #1724 )
...
* Base structure
* Add fnllm providers and Mock LLM
* Remove fnllm coupling, introduce llm providers
* Ruff + Tests fix
* Spellcheck
* Semver
* Format
* Default MockChat params
* Fix more tests
* Fix embedding smoke test
* Fix embeddings smoke test
* Fix MockEmbeddingLLM
* Rename LLM to model. Package organization
* Fix prompt tuning
* Oops
* Oops II
2025-02-20 08:56:20 -06:00
Nathan Evans
a42772d368
Query callbacks ( #1721 )
...
* Add callbacks to global search
* Add callbacks to local search
* Add streaming callbacks in local search CLI
* Add callbacks to basic search
* Add callbacks to DRIFT search
* Semver
* Return generators directly in API
* Guard callbacks
2025-02-19 13:00:07 -08:00
Nathan Evans
efcaf9636d
Tuck flow functions under their workflows ( #1720 )
...
* Move flow functions to workflow
* Remove redundant workflow_name variable
* Semver
2025-02-18 15:33:36 -06:00
Alonso Guevara
7f020826be
Fix/json mode community reports ( #1713 )
...
* Patch json mode on Community Reports
* Semversioner
* Wording oopsie
2025-02-14 16:51:42 -06:00
Nathan Evans
96219a2182
Register workflows ( #1691 )
...
* Add workflow registration
* Add ability to mutate config by workflows
* Separate graph finalization
* Separate graph pruning
* Semver
* Update tests
* Update smoke tests
* Fix iterrows on create_graph
* Remove prune_graph from llm construction
* Update test data
* Remove prune_graph from smoke tests
2025-02-14 13:21:31 -08:00
Nathan Evans
981fd31963
Community children ( #1704 )
...
* Add children to the community tables
* Replace NaN children with empty list
* Replace subcommunity logic with built-in parent/child fields
* Remove restore_community_hierarchy
* Add children and frequency to migration notebook
* Format
* Semver
* Add children to reports
* Update tests
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-13 17:03:51 -08:00
Nathan Evans
35b639399b
Incremental flow rework ( #1696 )
...
* Rework update output structure
* Semver
* Fix unit test
* Update frequency in incremental
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-13 18:22:32 -06:00
Alonso Guevara
5ef2399a6f
Chore/remove iterrows ( #1708 )
...
* Remove most iterrow usages
* Semver
* Ruff
* Pyright
* Format
2025-02-13 17:32:54 -06:00
Josh Bradley
f14cda2b6d
Improve default llm retry logic to be more optimized ( #1701 )
2025-02-13 16:56:37 -05:00
Josh Bradley
b8b949f3bb
Cleanup query api - remove code duplication ( #1690 )
...
* consolidate query api functions and remove code duplication
* refactor and remove more code duplication
* Add semversioner file
* fix basic search
* fix drift search and update base class function names
* update example notebooks
2025-02-13 16:31:08 -05:00
Nathan Evans
fe461417b5
Export NLP community reports prompt ( #1697 )
...
* Properly export the NLP community reports prompt
* Semver
* Fix verb tests
2025-02-12 10:41:39 -08:00
Dayenne Souza
b94290ec2b
add option to add metadata into text chunks ( #1681 )
...
* add new options
* add metadata json into input document
* remove doc change
* add metadata column into text loader
* prepend_metadata
* run fix
* fix tests and patch
* fix test
* add watrning for metadata tokens > config size
* fix typo and run fix
* fix test_integration
* fix test
* run check
* rename and fix chunking
* fix
* fix
* fiz test verbs
* fix
* fix tests
* fix chunking
* fix index
* fix cosmos test
* fix vars
* fix after PR
* fix
2025-02-12 09:38:03 -08:00
KennyZhang1
b9dc7b90d5
Fix/streamline workflow miq bugs ( #1694 )
...
* Add vector store id reference to embeddings config.
* added communities to links and maxvals
* Consistent naming
* Update entity_ids to include index_name
* added consistent logging messages to miq cli
* semversioner
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-02-11 16:13:28 -05:00
Nathan Evans
a6a78d5897
Nlp cache ( #1689 )
...
* Add cache to build_noun_graph
* Semver
2025-02-10 11:00:51 -08:00
Nathan Evans
c02ab0984a
Streamline workflows ( #1674 )
...
* Remove create_final_nodes
* Rename final entity output to "entities"
* Remove duplicate code from graph extraction
* Rename create_final_relationships output to "relationships"
* Rename create_final_communities output to "communities"
* Combine compute_communities and create_final_communities
* Rename create_final_covariates output to "covariates"
* Rename create_final_community_reports output to "community_reports"
* Rename create_final_text_units output to "text_units"
* Rename create_final_documents output to "documents"
* Remove transient snapshots config
* Move create_final_entities to finalize_entities operation
* Move create_final_relationships flow to finalize_relationships operation
* Reuse some community report functions
* Collapse most of graph and text unit-based report generation
* Unify schemas files
* Move community reports extractor
* Move NLP report prompt to prompts folder
* Fix a few pandas warnings
* Rename embeddings config to embed_text
* Rename claim_extraction config to extract_claims
* Remove nltk from standard graph extraction
* Fix verb tests
* Fix extract graph config naming
* Fix moved file reference
* Create v1-to-v2 migration notebook
* Semver
* Fix smoke test artifact count
* Raise tpm/rpm on smoke tests
* Update drift settings for smoke tests
* Reuse project directory var in api notebook
* Format
* Format
2025-02-07 11:11:03 -08:00
KennyZhang1
83cc2daf91
Multi-index query CLI support ( #1675 )
...
* Add vector store id reference to embeddings config.
* changed structure of output config section
* added cli integration for multi index global
* added cli integration for multi index local
* added cli integration for multi index drift and basic
* finished local testing of multi-index cli
* ruff fixes
* partially refactored test code to align with new output section
* more test changes for new output structure
* semversioner
* refactored to align with new multi index config proposal
* locally tested new multi-index output proposal
* cleaned up tests to align with new structure
---------
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
2025-02-07 12:56:48 -05:00
Alonso Guevara
0805924a35
Fix/drift n depth ( #1676 )
...
* Fix n_depth param
* Semver
* Change smoke tests params for drift
* Reduce log printing for expected exceptions
2025-02-05 17:22:34 -06:00
JunHo Kim (김준호)
a4d35bc66f
Fix typo in DEVELOPING.md instructions ( #1631 )
...
Corrected "this values" to "these values" for improved clarity. This ensures the documentation is more accurate and professional.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-02-04 13:16:57 -08:00
JunHo Kim (김준호)
30f36316af
Fix typo in table formatting in env_vars documentation ( #1632 )
...
Corrected a missing backtick in a note within the `GRAPHRAG_API_KEY` description. This ensures proper code formatting and improves readability in the documentation. No content was altered aside from formatting adjustments.
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
2025-02-04 13:14:58 -08:00
Dayenne Souza
ad5b5120ec
remove unused columns and rename document_attribute_columns ( #1672 )
...
* remove unused columns and change property document_attribute_columns to metadata
* format file
* fix 'metadata' column on output
* run check
* fix test on nltk
* remove docs changes
2025-02-03 14:37:06 -03:00
Nathan Evans
907d271f4e
Fix recursive report generation ( #1669 )
2025-01-30 11:03:25 -08:00
Nathan Evans
53b06aa2ac
Add generate_text_embeddings to FGR ( #1667 )
2025-01-29 14:31:48 -08:00
Derek Worthen
94bd2bb816
Require explicit azure auth settings when using AOI. ( #1665 )
...
* Require explicit azure auth settings when using AOI.
- Must set LanguageModel.azure_auth_type to either
"api_key" or "managed_identity" when using AOI.
* Fix smoke tests
* Use general auth_type property instead of azure_auth_type
* Remove unused error type
* Update validation
* Update validation comment
2025-01-29 12:28:47 -08:00
Nathan Evans
d31750f44d
NLP graph extraction ( #1652 )
...
* Add NLP extraction workflow
* Add text unit community summarization
* Add CLI flag for indexing method
* Regenerate poetry.lock
* Fix claims loading
* Merge fixes
* Add workflow overrides to config
* Semver
* Add graph pruning config
* Remove degree re-compute from pruning
* Switch to percentile for edge weight pruning
* Add NLP extraction config
* Add new NLP extractor options
* Add FGR workflows to util method
* Use a generator factory for workflows
* Update pruning defaults
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-01-28 12:27:03 -08:00
Derek Worthen
eeee84e9d9
Add vector store id reference to embeddings config. ( #1662 )
2025-01-28 10:46:41 -08:00