graphrag/docs
Nathan Evans 710fdad6f0
Some checks are pending
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Input factory (#2168)
* Update input factory to match other factories

* Move input config alongside input readers

* Move file pattern logic into InputReader

* Set encoding default

* Clean up optional column configs

* Combine structured data extraction

* Remove pandas from input loading

* Throw if empty documents

* Add json lines (jsonl) input support

* Store raw data

* Fix merge imports

* Move metadata handling entirely to chunking

* Nicer automatic title

* Typo

* Add get_property utility for nested dictionary access with dot notation

* Update structured_file_reader to use get_property utility

* Extract input module into new graphrag-input monorepo package

- Create new graphrag-input package with input loading utilities
- Move InputConfig, InputFileType, InputReader, TextDocument, and file readers (CSV, JSON, JSONL, Text)
- Add get_property utility for nested dictionary access with dot notation
- Include hashing utility for document ID generation
- Update all imports throughout codebase to use graphrag_input
- Add package to workspace configuration and release tasks
- Remove old graphrag.index.input module

* Rename ChunkResult to TextChunk and add transformer support

- Rename chunk_result.py to text_chunk.py with ChunkResult -> TextChunk
- Add 'original' field to TextChunk to track pre-transform text
- Add optional transform callback to chunker.chunk() method
- Add add_metadata transformer for prepending metadata to chunks
- Update create_chunk_results to apply transforms and populate original
- Update sentence_chunker and token_chunker with transform support
- Refactor create_base_text_units to use new transformer pattern
- Rename pluck_metadata to get/collect methods on TextDocument

* Back-compat comment

* Align input config type name with other factory configs

* Add MarkItDown support

* Remove pattern default from MarkItDown reader

* Remove plugins flag (implicit disabled)

* Format

* Update verb tests

* Separate storage from input config

* Add empty objects for NaN raw_data

* Fix smoke tests

* Fix BOM in csv smoke

* Format
2026-01-12 12:47:57 -08:00
..
config Input factory (#2168) 2026-01-12 12:47:57 -08:00
data/operation_dulce Replace current docs by mkdocs (#1263) 2024-10-11 13:39:03 -06:00
examples_notebooks Input factory (#2168) 2026-01-12 12:47:57 -08:00
img Add visualization guide (#1340) 2024-11-06 14:06:50 -05:00
index Input factory (#2168) 2026-01-12 12:47:57 -08:00
prompt_tuning Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
query Nov 2025 housekeeping (#2120) 2025-11-06 10:03:22 -08:00
scripts Fix cookie consent script missing (#1292) 2024-10-17 09:44:14 -06:00
stylesheets Docs update (#1408) 2024-11-14 21:26:29 -06:00
blog_posts.md Update blog posts (#1571) 2024-12-30 17:16:08 -06:00
cli.md Auto-generate CLI doc pages (#1325) 2024-10-25 19:00:24 -04:00
developing.md Nov 2025 housekeeping (#2120) 2025-11-06 10:03:22 -08:00
get_started.md Init command asks for models (#2137) 2025-11-24 10:05:47 -08:00
index.md Docs/2.6.0 (#2070) 2025-09-23 14:48:28 -07:00
visualization_guide.md Remove graph embedding and UMAP (#2048) 2025-09-09 15:35:43 -07:00