Nathan Evans
c296f1ae15
Fix a bunch of module comments and function visibility ( #2154 )
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.13) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Has been cancelled
Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Has been cancelled
Python Unit Tests / python-ci (windows-latest, 3.13) (push) Has been cancelled
2025-12-17 10:55:26 -08:00
Nathan Evans
4512ce027f
Empty graph guards ( #2126 )
...
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled
Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Integration Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unit Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled
* Remove networkx from graph_extractor and clean out redundancy
* Bubble pipeline error to console
2025-11-11 10:11:08 -08:00
Derek Worthen
619269243d
Restructure project as monorepo. ( #2111 )
...
* Restructure project as monorepo.
2025-11-04 09:51:56 -08:00
Nathan Evans
1bb9fa8e13
Unified factory ( #2105 )
...
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
* Simplify Factory interface
* Migrate CacheFactory to standard base class
* Migrate LoggerFactory to standard base class
* Migrate StorageFactory to standard base class
* Migrate VectorStoreFactory to standard base class
* Update vector store example notebook
* Delete notebook outputs
* Move default providers into factories
* Move retry/limit tests into integ
* Split language model factories
* Set smoke test tpm/rpm
* Fix factory integ tests
* Add method to smoke test, switch text to 'fast'
* Fix text smoke config for fast workflow
* Add new workflows to text smoke test
* Convert input readers to a proper factory
* Remove covariates from fast smoke test
* Update docs for input factory
* Bump smoke runtime
* Even longer runtime
* min-csv timeout
* Remove unnecessary lambdas
2025-10-20 12:05:27 -07:00
Nathan Evans
eb0dfe376b
Remove strategy dicts ( #2090 )
...
* Remove "strategy" from community reports config/workflow
* Remove extraction strategy from extract_graph
* Remove summarization strategy from extract_graph
* Remove strategy from claim extraction
* Strongly type prompt templates
* Remove strategy from embed_text
* Push hydrated params into community report workflows
* Push hyrdated params into extract covariates
* Push hydrated params into extract graph NLP
* Push hydrated params into extract graph
* Push hydrated params into text embeddings
* Remove a few more low-level defaults
* Semver
* Remove configurable prompt delimiters
* Update smoke tests
2025-10-10 12:15:23 -07:00
Copilot
e84df28e64
Improve internal logging functionality by using Python's standard logging module ( #1956 )
...
gh-pages / build (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run
Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Python Publish (pypi) / Upload release to PyPI (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run
Spellcheck / spellcheck (push) Waiting to run
* Initial plan for issue
* Implement standard logging module and integrate with existing loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add test cases and improve documentation for standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting and add semversioner file for logging improvements
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove custom logger classes and refactor to use standard logging only
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Apply ruff formatting to resolve CI/CD test failures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add semversioner file and fix linting issues
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* fix spelling error
* Remove StandardProgressLogger and refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove LoggerFactory and custom loggers, refactor to use standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright error: use logger.info() instead of calling logger as function in cosmosdb_pipeline_storage.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Remove deprecated logger files that were marked as deprecated placeholders
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace custom get_logger with standard Python logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues found by ruff check --fix
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* add word to dictionary
* Fix type checker error in ModelManager.__new__ method
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor multiple logging.getLogger() calls to use single logger per file
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove progress_logger parameter from build_index() and logger parameter from generate_indexing_prompts()
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove logger parameter from run_pipeline and standardize logger naming
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logger parameter with log_level parameter in CLI commands
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass poetry poe check
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove --logger parameter from smoke test command
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix Windows CI/CD issue with log file cleanup in tests
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Add StreamHandler to root logger in __main__.py for CLI logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Only add StreamHandler if root logger doesn't have existing StreamHandler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Replace logging.StreamHandler with colorlog.StreamHandler for colorized log output
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Regenerate poetry.lock file after adding colorlog dependency
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix import ordering in notebook files to pass ruff checks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* move printing of dataframes to debug level
* remove colorlog for now
* Refactor workflow callbacks to inherit from logging.Handler
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix linting issues in workflow callback handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type errors in blob and file workflow callbacks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Refactor pipeline logging to use pure logging.Handler subclasses
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Rename workflow callback classes to workflow logger classes and move to logger directory
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* update dictionary
* apply ruff fixes
* fix function name
* simplify logger code
* update
* Remove error, warning, and log methods from WorkflowCallbacks and replace with standard logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* ruff fixes
* Fix pyright errors by removing WorkflowCallbacks from strategy type signatures
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove ConsoleWorkflowLogger and apply consistent formatter to all handlers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Refactor pipeline_logger.py to use standard FileHandler and remove FileWorkflowLogger
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove conditional azure import checks from blob_workflow_logger.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Fix pyright type checking errors in mock_provider.py and utils.py
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Run ruff check --fix to fix import ordering in notebooks
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Merge configure_logging and create_pipeline_logger into init_loggers function
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* Remove configure_logging and create_pipeline_logger functions, replace all usage with init_loggers
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* cleanup unused code
* Update init_loggers to accept GraphRagConfig instead of ReportingConfig
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff check fixes
* Fix test failures by providing valid GraphRagConfig with required model configurations
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* remove logging_workflow_callback
* cleanup logging messages
* Add logging to track progress of pandas DataFrame apply operation in create_base_text_units
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* cleanup logger logic throughout codebase
* update
* more cleanup of old loggers
* small logger cleanup
* final code cleanup and added loggers to query
* add verbose logging to query
* minor code cleanup
* Fix broken unit tests for chunk_text and standard_logging
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* apply ruff fixes
* Fix test_chunk_text by mocking progress_ticker function instead of ProgressTicker class
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
* remove unnecessary logger
* remove rich and fix type annotation
* revert test formatting changes my by copilot
* promote graphrag logs to root logger
* add correct semversioner file
* revert change to file
* revert formatting changes that have no effect
* fix changes after merge with main
* revert unnecessary copilot changes
* remove whitespace
* cleanup docstring
* simplify some logic with less code
* update poetry lock file
* ruff fixes
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2025-07-09 18:29:03 -06:00
Nathan Evans
ede6a74546
Pipeline callbacks ( #1729 )
...
* Add pipeline_start and pipeline_end callbacks
* Collapse redundant callback/logger logic
* Remove redundant reporting config classes
* Remove a few out-of-date type ignores
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-02-25 15:07:51 -08:00
Alonso Guevara
7bdeaee94a
Create Language Model Providers and Registry methods. Remove fnllm coupling ( #1724 )
...
* Base structure
* Add fnllm providers and Mock LLM
* Remove fnllm coupling, introduce llm providers
* Ruff + Tests fix
* Spellcheck
* Semver
* Format
* Default MockChat params
* Fix more tests
* Fix embedding smoke test
* Fix embeddings smoke test
* Fix MockEmbeddingLLM
* Rename LLM to model. Package organization
* Fix prompt tuning
* Oops
* Oops II
2025-02-20 08:56:20 -06:00
Josh Bradley
f14cda2b6d
Improve default llm retry logic to be more optimized ( #1701 )
2025-02-13 16:56:37 -05:00
Nathan Evans
c02ab0984a
Streamline workflows ( #1674 )
...
* Remove create_final_nodes
* Rename final entity output to "entities"
* Remove duplicate code from graph extraction
* Rename create_final_relationships output to "relationships"
* Rename create_final_communities output to "communities"
* Combine compute_communities and create_final_communities
* Rename create_final_covariates output to "covariates"
* Rename create_final_community_reports output to "community_reports"
* Rename create_final_text_units output to "text_units"
* Rename create_final_documents output to "documents"
* Remove transient snapshots config
* Move create_final_entities to finalize_entities operation
* Move create_final_relationships flow to finalize_relationships operation
* Reuse some community report functions
* Collapse most of graph and text unit-based report generation
* Unify schemas files
* Move community reports extractor
* Move NLP report prompt to prompts folder
* Fix a few pandas warnings
* Rename embeddings config to embed_text
* Rename claim_extraction config to extract_claims
* Remove nltk from standard graph extraction
* Fix verb tests
* Fix extract graph config naming
* Fix moved file reference
* Create v1-to-v2 migration notebook
* Semver
* Fix smoke test artifact count
* Raise tpm/rpm on smoke tests
* Update drift settings for smoke tests
* Reuse project directory var in api notebook
* Format
* Format
2025-02-07 11:11:03 -08:00
Nathan Evans
a2647da473
Simplify flow config ( #1554 )
...
* Flatten compute_communities config
* Remove cluster strategy type
* Flatten create_base_text_units config
* Move cluster seed to config default, leave as None in functions
* Remove "prechunked" logic
* Remove hard-coded encoding model
* Remove unused variables
* Strongly type embed_config
* Simplify layout_graph config
* Semver
* Fix integration test
* Fix config unit tests: ignore new config defaults
* Remove pipeline integ test
2024-12-27 16:38:36 -08:00
Nathan Evans
c1c09bab80
Flow cleanup ( #1510 )
...
* Move snapshots out of flows into verbs
* Move degree compute out of extract_graph
* Move entity/relationship df merging into extract
* Move "title" to extraction source
* Move text_unit_ids agg closer to extraction
* Move data definition
* Update test data
* Semver
* Update smoke tests
* Fix empty degree field and update smoke tests and verb data
* Move extractors (#1516 )
* Consolidate graph embedding and umap
* Consolidate claim extraction
* Consolidate graph extractor
* Move graph utils
* Move summarizers
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
* Fix syntax typo
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 18:07:44 -08:00
Nathan Evans
d0543d1fd6
Move extractors ( #1516 )
...
* Consolidate graph embedding and umap
* Consolidate claim extraction
* Consolidate graph extractor
* Move graph utils
* Move summarizers
* Semver
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-12-18 16:21:41 -08:00
Chris Trevino
5ff2d3c76d
Remove graphrag.llm, replace with fnllm ( #1315 )
...
* add fnllm; remove llm folder
* remove llm unit tests
* update imports
* update imports
* formatting
* enable autosave
* update mockllm
* update community reports extractor
* move most llm usage to fnllm
* update type issues
* fix unit tests
* type updates
* update dictionary
* semver
* update llm construction, get integration tests working
* load from llmparameters model
* move ruff settings to ruff.toml
* add gitattributes file
* ignore ruff.toml spelling
* update .gitattributes
* update gitignore
* update config construction
* update prompt var usage
* add cache adapter
* use cache adapter in embeddings calls
* update embedding strategy
* add fnllm
* add pytest-dotenv
* fix some verb tests
* get verbtests running
* update ruff.toml for vscode
* enable ruff native server in vscode
* update artifact inspecting code
* remove local-test update
* use string.replace instead of string.format in community reprots etxractor
* bump timeout
* revert ruff.toml, vscode settings for another pr
* revert cspell config
* revert gitignore
* remove json-repair, update fnllm
* use fnllm generic type interfaces
* update load_llm to use target models
* consolidate chat parameters
* add 'extra_attributes' prop to community report response
* formatting
* update fnllm
* formatting
* formatting
* Add defaults to some llm params to avoid null on params hash
* Formatting
---------
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-12-05 18:07:47 -06:00
Nathan Evans
c8c354e357
Artifact cleanup ( #1341 )
...
* Add source documents for verb tests
* Remove entity_type erroneous column
* Add new test data
* Remove source/target degree columns
* Remove top_level_node_id
* Remove chunk column configs
* Rename "chunk" to "text"
* Rename "chunk" to "text" in base
* Re-map document input to use base text units
* Revert base text units as final documents dep
* Update test data
* Split/rename node source_id
* Drop node size (dup of degree)
* Drop document_ids from covariates
* Remove unused document_ids from models
* Remove n_tokens from covariate table
* Fix missed document_ids delete
* Wire base text units to final documents
* Rename relationship rank as combined_degree
* Add rank as first-class property to Relationship
* Remove split_text operation
* Fix relationships test parquet
* Update test parquets
* Add entity ids to community table
* Remove stored graph embedding columns
* Format
* Semver
* Fix JSON typo
* Spelling
* Rename lancedb
* Sort lancedb
* Fix unit test
* Fix test to account for changing period
* Update tests for separate embeddings
* Format
* Better assertion printing
* Fix unit test for windows
* Rename document.raw_content -> document.text
* Remove read_documents function
* Remove unused document summary from model
* Remove unused imports
* Format
* Add new snapshots to default init
* Use util to construct embeddings collection name
* Align inc index model with branch changes
* Update data and tests for int ids
* Clean up embedding locs
* Switch entity "name" to "title" for consistency
* Fix short_id -> human_readable_id defaults
* Format
* Rework community IDs
* Fix community size compute
* Fix unit tests
* Fix report read
* Pare down nodes table output
* Fix unit test
* Fix merge
* Fix community loading
* Format
* Fix community id report extraction
* Update tests
* Consistent short IDs and ordering
* Update ordering and tests
* Update incremental for new nodes model
* Guard document columns loc
* Match column ordering
* Fix document guard
* Update smoke tests
* Fill NA on community extract
* Logging for smoke test debug
* Add parquet schema details doc
* Fix community hierarchy guard
* Use better empty hierarchy guard
* Back-compat shims
* Semver
* Fix warning
* Format
* Remove default fallback
* Reuse key
2024-11-13 15:11:19 -08:00
Nathan Evans
ce5b1207e0
Collapse graph documents workflows ( #1284 )
...
* Copy base documents logic into final documents
* Delete create_base_documents
* Combine graph creation under create_base_entity_graph
* Delete collapsed workflows
* Migrate most graph internals to nx.Graph
* Fix None edge case
* Semver
* Remove comment typo
* Fix smoke tests
2024-10-15 13:58:58 -06:00
Nathan Evans
61b3d6d56a
Migrate helper verbs ( #1248 )
...
* Remove genid
* Move snapshot_rows
* Move snapshot
* Delete spread_json
* Delete unzip
* Delete zip
* Move unpack_graph
* Move compute_edge_combined_degree
* Delete create_graph
* Delete concat
* Delete text replace
* Delete text_translate
* Move text_split
* Inline aggregate override
* Move cluster_graph
* Move merge_graphs
* Semver
* Move text_chunk
* Move layout_graph and fix some __init__s
* Move extract_covariates
* Rename text_split -> split_text
* Move extract_entities
* Move summarize_descriptions
* Rename text_chunk -> chunk_text
* Move community report creation
* Remove verb-level packing operators
* Streamline some naming
* Streamline param name/order
* Move mock LLM data to tests
* Fixed missed rename
* Update some strategy refs
* Rename run_gi
* Inject mock responses into integ test config
2024-10-09 13:46:44 -07:00
Alonso Guevara
81b81cf60b
Initial Release
2024-07-01 15:25:30 -06:00