graphrag

mirror of https://github.com/microsoft/graphrag.git synced 2026-01-14 00:57:23 +08:00

History

Nathan Evans 710fdad6f0 Some checks are pending Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run Details Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run Details Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run Details Input factory (#2168 ) * Update input factory to match other factories * Move input config alongside input readers * Move file pattern logic into InputReader * Set encoding default * Clean up optional column configs * Combine structured data extraction * Remove pandas from input loading * Throw if empty documents * Add json lines (jsonl) input support * Store raw data * Fix merge imports * Move metadata handling entirely to chunking * Nicer automatic title * Typo * Add get_property utility for nested dictionary access with dot notation * Update structured_file_reader to use get_property utility * Extract input module into new graphrag-input monorepo package - Create new graphrag-input package with input loading utilities - Move InputConfig, InputFileType, InputReader, TextDocument, and file readers (CSV, JSON, JSONL, Text) - Add get_property utility for nested dictionary access with dot notation - Include hashing utility for document ID generation - Update all imports throughout codebase to use graphrag_input - Add package to workspace configuration and release tasks - Remove old graphrag.index.input module * Rename ChunkResult to TextChunk and add transformer support - Rename chunk_result.py to text_chunk.py with ChunkResult -> TextChunk - Add 'original' field to TextChunk to track pre-transform text - Add optional transform callback to chunker.chunk() method - Add add_metadata transformer for prepending metadata to chunks - Update create_chunk_results to apply transforms and populate original - Update sentence_chunker and token_chunker with transform support - Refactor create_base_text_units to use new transformer pattern - Rename pluck_metadata to get/collect methods on TextDocument * Back-compat comment * Align input config type name with other factory configs * Add MarkItDown support * Remove pattern default from MarkItDown reader * Remove plugins flag (implicit disabled) * Format * Update verb tests * Separate storage from input config * Add empty objects for NaN raw_data * Fix smoke tests * Fix BOM in csv smoke * Format		2026-01-12 12:47:57 -08:00
..
config	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
data/operation_dulce	Replace current docs by mkdocs (#1263 )	2024-10-11 13:39:03 -06:00
examples_notebooks	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
img	Add visualization guide (#1340 )	2024-11-06 14:06:50 -05:00
index	Input factory (#2168 )	2026-01-12 12:47:57 -08:00
prompt_tuning	Graphrag config (#2119 )	2025-11-10 07:57:03 -08:00
query	Nov 2025 housekeeping (#2120 )	2025-11-06 10:03:22 -08:00
scripts	Fix cookie consent script missing (#1292 )	2024-10-17 09:44:14 -06:00
stylesheets	Docs update (#1408 )	2024-11-14 21:26:29 -06:00
blog_posts.md	Update blog posts (#1571 )	2024-12-30 17:16:08 -06:00
cli.md	Auto-generate CLI doc pages (#1325 )	2024-10-25 19:00:24 -04:00
developing.md	Nov 2025 housekeeping (#2120 )	2025-11-06 10:03:22 -08:00
get_started.md	Init command asks for models (#2137 )	2025-11-24 10:05:47 -08:00
index.md	Docs/2.6.0 (#2070 )	2025-09-23 14:48:28 -07:00
visualization_guide.md	Remove graph embedding and UMAP (#2048 )	2025-09-09 15:35:43 -07:00