graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2026-01-14 09:07:20 +08:00

Author	SHA1	Message	Date
Nathan Evans	22a4d29a10	DRIFT fixes (#2171 ) Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details * Use stable ids for community reports * Remove deprecated title from embedding flow * Remove embedding column from df loaders * Fix lancedb insertion * Add drift back to smoke tests * Fix mock embedder to match default embedding length * Fix DRIFT notebook * Push drift_k_followups through to prompt * Format	2026-01-13 15:56:26 -08:00
Nathan Evans	710fdad6f0	Input factory (#2168 ) Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details * Update input factory to match other factories * Move input config alongside input readers * Move file pattern logic into InputReader * Set encoding default * Clean up optional column configs * Combine structured data extraction * Remove pandas from input loading * Throw if empty documents * Add json lines (jsonl) input support * Store raw data * Fix merge imports * Move metadata handling entirely to chunking * Nicer automatic title * Typo * Add get_property utility for nested dictionary access with dot notation * Update structured_file_reader to use get_property utility * Extract input module into new graphrag-input monorepo package - Create new graphrag-input package with input loading utilities - Move InputConfig, InputFileType, InputReader, TextDocument, and file readers (CSV, JSON, JSONL, Text) - Add get_property utility for nested dictionary access with dot notation - Include hashing utility for document ID generation - Update all imports throughout codebase to use graphrag_input - Add package to workspace configuration and release tasks - Remove old graphrag.index.input module * Rename ChunkResult to TextChunk and add transformer support - Rename chunk_result.py to text_chunk.py with ChunkResult -> TextChunk - Add 'original' field to TextChunk to track pre-transform text - Add optional transform callback to chunker.chunk() method - Add add_metadata transformer for prepending metadata to chunks - Update create_chunk_results to apply transforms and populate original - Update sentence_chunker and token_chunker with transform support - Refactor create_base_text_units to use new transformer pattern - Rename pluck_metadata to get/collect methods on TextDocument * Back-compat comment * Align input config type name with other factory configs * Add MarkItDown support * Remove pattern default from MarkItDown reader * Remove plugins flag (implicit disabled) * Format * Update verb tests * Separate storage from input config * Add empty objects for NaN raw_data * Fix smoke tests * Fix BOM in csv smoke * Format	2026-01-12 12:47:57 -08:00
gaudyb	c649d9f6ee	Issue #2004 fix (#2159 ) * fix issue #2004 using KeenhoChu idea in his PR * add unit test for dynamic community selection * add unit test for dynamic community selection implementing #2158 logic --------- Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>	2025-12-30 18:08:10 -06:00
Nathan Evans	d6e6191d84	Format	2025-11-17 13:59:58 -08:00
gaudyb	a4ffc3d34c	Remove embeddings optional new (#2128 ) Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.12) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.12) (push) Waiting to run Details * remove optional embeddings * fix test * fix tests * fix pipeline * fix test * fix test * fix test * fix tests --------- Co-authored-by: Gaudy Blanco <gaudy-microsoft@MacBook-Pro-m4-Gaudy-For-Work.local>	2025-11-17 13:10:54 -06:00
Derek Worthen	e0cce31f54	Graphrag config (#2119 ) Some checks failed Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Unit Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details * Add load_config to graphrag-common package.	2025-11-10 07:57:03 -08:00
Nathan Evans	ae1f5e1811	Nov 2025 housekeeping (#2120 ) Some checks failed Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python Build and Type Check / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Build and Type Check / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details Python Unit Tests / python-ci (ubuntu-latest, 3.12) (push) Has been cancelled Details Python Unit Tests / python-ci (windows-latest, 3.12) (push) Has been cancelled Details * Remove gensim sideload * Split CI build/type checks from unit tests * Thorough review of docs to align with v3 * Format * Fix version * Fix type	2025-11-06 10:03:22 -08:00
Nathan Evans	6b03af6277	Fix formatting Some checks are pending Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details	2025-11-04 10:48:09 -08:00
Derek Worthen	619269243d	Restructure project as monorepo. (#2111 ) * Restructure project as monorepo.	2025-11-04 09:51:56 -08:00
Nathan Evans	1bb9fa8e13	Unified factory (#2105 ) Some checks are pending Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details * Simplify Factory interface * Migrate CacheFactory to standard base class * Migrate LoggerFactory to standard base class * Migrate StorageFactory to standard base class * Migrate VectorStoreFactory to standard base class * Update vector store example notebook * Delete notebook outputs * Move default providers into factories * Move retry/limit tests into integ * Split language model factories * Set smoke test tpm/rpm * Fix factory integ tests * Add method to smoke test, switch text to 'fast' * Fix text smoke config for fast workflow * Add new workflows to text smoke test * Convert input readers to a proper factory * Remove covariates from fast smoke test * Update docs for input factory * Bump smoke runtime * Even longer runtime * min-csv timeout * Remove unnecessary lambdas	2025-10-20 12:05:27 -07:00
Nathan Evans	5ec49fd39c	V3 docs and cleanup (#2100 ) Some checks are pending Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details * Remove community contrib notebooks * Add migration notebook and breaking changes page edits * Update/polish docs * Make model instance name configurable * Add vector schema updates to v3 migration notebook * Spellcheck * Bump smoke test runtimes	2025-10-15 13:47:19 -07:00
Nathan Evans	b732445535	Remove multi search (#2093 ) Some checks failed Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details * Remove multi-search from CLI * Remove multi-search from API * Flatten vector_store config * Push hydrated vector store down to embed_text * Remove outputs from config * Remove multi-search notebook/docs * Add missing response_type in basic search API * Fix basic search context and id mapping * Fix v1 migration notebook * Fix query entity search tests	2025-10-10 17:20:53 -07:00
Nathan Evans	6284cdd110	Remove fnllm (#2095 )	2025-10-10 16:59:25 -07:00
Nathan Evans	eb0dfe376b	Remove strategy dicts (#2090 ) * Remove "strategy" from community reports config/workflow * Remove extraction strategy from extract_graph * Remove summarization strategy from extract_graph * Remove strategy from claim extraction * Strongly type prompt templates * Remove strategy from embed_text * Push hydrated params into community report workflows * Push hyrdated params into extract covariates * Push hydrated params into extract graph NLP * Push hydrated params into extract graph * Push hydrated params into text embeddings * Remove a few more low-level defaults * Semver * Remove configurable prompt delimiters * Update smoke tests	2025-10-10 12:15:23 -07:00
Nathan Evans	ac8a7f5eef	Housekeeping (#2086 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Add deprecation warnings for fnllm and multi-search * Fix dangling token_encoder refs * Fix local_search notebook * Fix global search dynamic notebook * Fix global search notebook * Fix drift notebook * Switch example notebooks to use LiteLLM config * Properly annotate dev deps as a group * Semver * Remove --extra dev * Remove llm_model variable * Ignore ruff ASYNC240 * Add note about expected broken notebook in docs * Fix custom vector store notebook * Push tokenizer throughout	2025-10-07 16:21:24 -07:00
Nathan Evans	1cb20b66f5	Input docs API parameter (#2034 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Add optional input_documents to index API * Semver * Add input dataframe example notebook * Format * Fix docs and notebook	2025-09-02 16:15:50 -07:00
Copilot	2030f94eb4	Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation (#2006 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Initial plan * Refactor VectorStoreFactory to use registration functionality like StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix linting issues in VectorStoreFactory refactoring Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove backward compatibility support from VectorStoreFactory and StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Run ruff check --fix and ruff format, add semversioner file Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * update storage factory design * Refactor CacheFactory to use registration functionality like StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * revert copilot changes * fix copilot changes * update comments * Fix failing pytest compatibility for factory tests Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * update class instantiation issue * ruff fixes * fix pytest * add default value * ruff formatting changes * ruff fixes * revert minor changes * cleanup cache factory * Update CacheFactory tests to match consistent factory pattern Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * update pytest thresholds * adjust threshold levels * Add custom vector store implementation notebook Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes: - Complete implementation of SimpleInMemoryVectorStore - Registration with VectorStoreFactory - Testing and validation examples - Configuration examples for GraphRAG settings - Advanced features and best practices - Production considerations checklist The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends. Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove sample notebook for now * update tests * fix cache pytests * add pandas-stub to dev dependencies * disable warning check for well known key * skip tests when running on ubuntu * add documentation for custom vector store implementations * ignore ruff findings in notebooks * fix merge breakages * speedup CLI import statements * remove unnecessary import statements in init file * Add str type option on storage/cache type * Fix store name * Add LoggerFactory * Fix up logging setup across CLI/API * Add LoggerFactory test * Fix err message * Semver * Remove enums from factory methods --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Nathan Evans <github@talkswithnumbers.com>	2025-08-28 13:53:07 -07:00
Copilot	7c28c70d5c	Switch from Poetry to uv for package management (#2008 ) Some checks are pending gh-pages / build (push) Waiting to run Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python CI / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Python Publish (pypi) / Upload release to PyPI (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Waiting to run Details Spellcheck / spellcheck (push) Waiting to run Details * Initial plan * Switch from Poetry to uv for package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Clean up build artifacts and update gitignore Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove build artifacts * remove hardcoded version string * fix calls to pip in cicd * Update gh-pages.yml workflow to use uv instead of Poetry Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * update cicd workflow with latest uv action * fix command to retrieve package version * update development instructions * remove Poetry references * Replace deprecated azuright action with npm-based Azurite installation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * skip api version check for azurite * add semversioner file * update more changes from switching to UV * Migrate unified-search-app from Poetry to uv package management Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * minor typo update * minor Dockerfile update * update cicd thresholds * update pytest thresholds * ruff fixes * ruff fixes * remove legacy npm settings that no longer apply * Update Unified Search App Readme --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-08-13 18:57:25 -06:00
Nathan Evans	321d479ab6	Update notebooks for 2.0 (#1785 ) * Update API overview * Fix global search example * Fix local search example * Fix global dynamic example * Fix drift example * Update multi-index example * Semver	2025-03-11 17:23:49 -07:00
Nathan Evans	bcb74789f1	Next release docs (#1627 ) * Wordind updates * Update yam lconfig and add notes to "deprecated" env * Add basic search section * Update versioning docs * Minor edits for clarity * Update init command * Update init to add --force in docs * Add NLP extraction params * Move vector_store to root * Add workflows to config * Add FastGraphRAG docs * add metadata column changes * Added documentation for multi index search. * Minor fixes. * Add config and table renames * Update migration notebook and comments to specify v1 * Add frequency to entity table docs * add new chunking options for metadata * Update output docs * Minor edits and cleanup * Add model ids to search configs * Spruce up migration notebook * Lint/format multi-index notebook * SpaCy model note * Update SpaCy footnote * Updated multi_index_search.ipynb to remove ruff errors. * add spacy to dictionary --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Dayenne Souza <ddesouza@microsoft.com> Co-authored-by: dorbaker <dorbaker@microsoft.com>	2025-03-03 14:46:00 -08:00
Nathan Evans	61a309b182	Incremental model alignment (#1766 ) * Used shared schema lists for all final columns * Semver	2025-02-25 13:14:42 -06:00
Nathan Evans	981fd31963	Community children (#1704 ) * Add children to the community tables * Replace NaN children with empty list * Replace subcommunity logic with built-in parent/child fields * Remove restore_community_hierarchy * Add children and frequency to migration notebook * Format * Semver * Add children to reports * Update tests --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-02-13 17:03:51 -08:00
Josh Bradley	b8b949f3bb	Cleanup query api - remove code duplication (#1690 ) * consolidate query api functions and remove code duplication * refactor and remove more code duplication * Add semversioner file * fix basic search * fix drift search and update base class function names * update example notebooks	2025-02-13 16:31:08 -05:00
Nathan Evans	c02ab0984a	Streamline workflows (#1674 ) * Remove create_final_nodes * Rename final entity output to "entities" * Remove duplicate code from graph extraction * Rename create_final_relationships output to "relationships" * Rename create_final_communities output to "communities" * Combine compute_communities and create_final_communities * Rename create_final_covariates output to "covariates" * Rename create_final_community_reports output to "community_reports" * Rename create_final_text_units output to "text_units" * Rename create_final_documents output to "documents" * Remove transient snapshots config * Move create_final_entities to finalize_entities operation * Move create_final_relationships flow to finalize_relationships operation * Reuse some community report functions * Collapse most of graph and text unit-based report generation * Unify schemas files * Move community reports extractor * Move NLP report prompt to prompts folder * Fix a few pandas warnings * Rename embeddings config to embed_text * Rename claim_extraction config to extract_claims * Remove nltk from standard graph extraction * Fix verb tests * Fix extract graph config naming * Fix moved file reference * Create v1-to-v2 migration notebook * Semver * Fix smoke test artifact count * Raise tpm/rpm on smoke tests * Update drift settings for smoke tests * Reuse project directory var in api notebook * Format * Format	2025-02-07 11:11:03 -08:00
Derek Worthen	c644338bae	Refactor config (#1593 ) * Refactor config - Add new ModelConfig to represent LLM settings - Combines LLMParameters, ParallelizationParameters, encoding_model, and async_mode - Add top level models config that is a list of available LLM ModelConfigs - Remove LLMConfig inheritance and delete LLMConfig - Replace the inheritance with a model_id reference to the ModelConfig listed in the top level models config - Remove all fallbacks and hydration logic from create_graphrag_config - This removes the automatic env variable overrides - Support env variables within config files using Templating - This requires "$" to be escaped with extra "$" so ".\\.txt$" becomes ".\\.txt$$" - Update init content to initialize new config file with the ModelConfig structure * Use dict of ModelConfig instead of list * Add model validations and unit tests * Fix ruff checks * Add semversioner change * Fix unit tests * validate root_dir in pydantic model * Rename ModelConfig to LanguageModelConfig * Rename ModelConfigMissingError to LanguageModelConfigMissingError * Add validationg for unexpected API keys * Allow skipping pydantic validation for testing/mocking purposes. * Add default lm configs to verb tests * smoke test * remove config from flows to fix llm arg mapping * Fix embedding llm arg mapping * Remove timestamp from smoke test outputs * Remove unused "subworkflows" smoke test properties * Add models to smoke test configs * Update smoke test output path * Send logs to logs folder * Fix output path * Fix csv test file pattern * Update placeholder * Format * Instantiate default model configs * Fix unit tests for config defaults * Fix migration notebook * Remove create_pipeline_config * Remove several unused config models * Remove indexing embedding and input configs * Move embeddings function to config * Remove skip_workflows * Remove skip embeddings in favor of explicit naming * fix unit test spelling mistake * self.models[model_id] is already a language model. Remove redundant casting. * update validation errors to instruct users to rerun graphrag init * instantiate LanguageModelConfigs with validation * skip validation in unit tests * update verb tests to use default model settings instead of skipping validation * test using llm settings * cleanup verb tests * remove unsafe default model config * remove the ability to skip pydantic validation * remove None union types when default values are set * move vector_store from embeddings to top level of config and delete resolve_paths * update vector store settings * fix vector store and smoke tests * fix serializing vector_store settings * fix vector_store usage * fix vector_store type * support cli overrides for loading graphrag config * rename storage to output * Add --force flag to init * Remove run_id and resume, fix Drift config assignment * Ruff --------- Co-authored-by: Nathan Evans <github@talkswithnumbers.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2025-01-21 17:52:06 -06:00
Alonso Guevara	e21a38f2ab	Fix/notebooks (#1614 ) * Add new inputs and missing vector store for retrieving vectors * Format * Semver * Remove .Identifier files * Fix spellcheck * Remove unnecessary input file for notebooks	2025-01-13 17:41:39 -06:00
Nathan Evans	7ec9ef0261	Refactor callbacks (#1583 ) * Unify Workflow and Verb callbacks interfaces * Semver * Fix storage class instantiation (#1582) --------- Co-authored-by: Josh Bradley <joshbradley@microsoft.com>	2025-01-06 10:58:59 -08:00
Nathan Evans	a35cb12741	Remove datashaper strip code (#1581 ) Remove datashaper	2025-01-03 13:59:26 -08:00
Nathan Evans	61816e076f	Migration notebook (#1492 ) * Add migration notebook * Update migration instructions * Semver * Rename item in relationships table * Remove indexing vector store shim * Remove query shims * Remove columns from migrated data * Format * Add community parents	2024-12-10 14:23:26 -06:00
Josh Bradley	b00142260d	Update index API + a notebook that provides a general API overview (#1454 ) * update index api to accept callbacks * fix hardcoded folder name that was creating an empty folder * add API notebook * add semversioner file * filename change --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-12-05 15:34:21 -06:00
Nathan Evans	0b2120ca45	Docs and notebooks update (#1451 ) * Fix local question gen and example notebook * Update global search notebook * Add lazy blog post * Update breaking changes doc for migration notes * Simplify Getting Started page * Semver * Spellcheck * Fix types * Add comments on cache-free migration * Update wording * Spelling --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-11-27 09:56:48 -08:00
Nathan Evans	c8c354e357	Artifact cleanup (#1341 ) * Add source documents for verb tests * Remove entity_type erroneous column * Add new test data * Remove source/target degree columns * Remove top_level_node_id * Remove chunk column configs * Rename "chunk" to "text" * Rename "chunk" to "text" in base * Re-map document input to use base text units * Revert base text units as final documents dep * Update test data * Split/rename node source_id * Drop node size (dup of degree) * Drop document_ids from covariates * Remove unused document_ids from models * Remove n_tokens from covariate table * Fix missed document_ids delete * Wire base text units to final documents * Rename relationship rank as combined_degree * Add rank as first-class property to Relationship * Remove split_text operation * Fix relationships test parquet * Update test parquets * Add entity ids to community table * Remove stored graph embedding columns * Format * Semver * Fix JSON typo * Spelling * Rename lancedb * Sort lancedb * Fix unit test * Fix test to account for changing period * Update tests for separate embeddings * Format * Better assertion printing * Fix unit test for windows * Rename document.raw_content -> document.text * Remove read_documents function * Remove unused document summary from model * Remove unused imports * Format * Add new snapshots to default init * Use util to construct embeddings collection name * Align inc index model with branch changes * Update data and tests for int ids * Clean up embedding locs * Switch entity "name" to "title" for consistency * Fix short_id -> human_readable_id defaults * Format * Rework community IDs * Fix community size compute * Fix unit tests * Fix report read * Pare down nodes table output * Fix unit test * Fix merge * Fix community loading * Format * Fix community id report extraction * Update tests * Consistent short IDs and ordering * Update ordering and tests * Update incremental for new nodes model * Guard document columns loc * Match column ordering * Fix document guard * Update smoke tests * Fill NA on community extract * Logging for smoke test debug * Add parquet schema details doc * Fix community hierarchy guard * Use better empty hierarchy guard * Back-compat shims * Semver * Fix warning * Format * Remove default fallback * Reuse key	2024-11-13 15:11:19 -08:00
Alonso Guevara	e53422366d	Implement dynamic community selection for global search (#1396 ) * update gitignore * add dynamic community sleection to updated main branch * update SearchResult to record output_tokens. * update search result * dynamic search working * format * add llm_calls_categories and prompt_tokens and output_tokens cate * update * formatting * log drift search output and prompt tokens separately * update global_search.ipynb. update operate dulce dataset and add create_final_communities. update dynamic community selection init * add .ipynb back to cspell.config.yaml * format * add notebook example on dynamic search * rearrange * update gitignore * format code * code format * code format * fix default variable --------- Co-authored-by: Bryan Li <bryanlimy@gmail.com>	2024-11-11 16:45:07 -08:00
Alonso Guevara	d9f985ae52	Drift Search CLI, API, Docs and Example Notebook (#1348 ) * Drift CLI and backwards compat * Adding DRIFT Cli, Docs and example notebook * Update tests and fix ruff * Format * Small cleanup * Fix smoke tests * Update notebook * Oopsie fix * Delete duplicate img	2024-11-05 12:05:19 -06:00
gaudyb	17658c5df8	New workflow to generate embeddings in a single workflow (#1296 ) * New workflow to generate embeddings in a single workflow * New workflow to generate embeddings in a single workflow * version change * clean tests without any embeddings references * clean tests without any embeddings references * remove code * feedback implemented * changes in logic * feedback implemented * store in table bug fixed * smoke test for generate_text_embeddings workflow * smoke test fix * add generate_text_embeddings to the list of transient workflows * smoke tests * fix * ruff formatting updates * fix * smoke test fixed * smoke test fixed * fix lancedb import * smoke test fix * ignore sorting * smoke test fixed * smoke test fixed * check smoke test * smoke test fixed * change config for vector store * format fix * vector store changes * revert debug profile back to empty filepath * merge conflict solved * merge conflict solved * format fixed * format fixed * fix return dataframe * snapshot fix * format fix * embeddings param implemented * validation fixes * fix map * fix map * fix properties * config updates * smoke test fixed * settings change * Update collection config and rework back-compat * Repalce . with - for embedding store --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Nathan Evans <github@talkswithnumbers.com>	2024-11-01 15:01:35 -07:00
Andres Morales	fc9895f793	Replace current docs by mkdocs (#1263 ) * Replace docs by mkdocs-material * Fix markdown * Fix verions in gh-pages workflow * remove whitespaces * add semver * Add build docs check on python-ci * Fix command in index cli * Spellcheck * Spellcheck * remove docsite paths * clear outputs from notebook * remove dependabot npm for docsite * remove more docsite left overs * execute notebooks * Update notebooks * update poetry lock * Remove notebook build from ci * Revert dep update * Navigation tabs * Fix stylesheet * add kwds to dictionary * Turn on notebook execution * Update gitignore * Add MSR Blog posts * spellcheck * Accessibility Changes --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>	2024-10-11 13:39:03 -06:00

36 Commits