graphrag/tests/verbs
Nathan Evans 710fdad6f0
Some checks are pending
Python Build and Type Check / python-ci (ubuntu-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.11) (push) Waiting to run
Python Build and Type Check / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Integration Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Notebook Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Smoke Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (ubuntu-latest, 3.13) (push) Waiting to run
Python Unit Tests / python-ci (windows-latest, 3.13) (push) Waiting to run
Input factory (#2168)
* Update input factory to match other factories

* Move input config alongside input readers

* Move file pattern logic into InputReader

* Set encoding default

* Clean up optional column configs

* Combine structured data extraction

* Remove pandas from input loading

* Throw if empty documents

* Add json lines (jsonl) input support

* Store raw data

* Fix merge imports

* Move metadata handling entirely to chunking

* Nicer automatic title

* Typo

* Add get_property utility for nested dictionary access with dot notation

* Update structured_file_reader to use get_property utility

* Extract input module into new graphrag-input monorepo package

- Create new graphrag-input package with input loading utilities
- Move InputConfig, InputFileType, InputReader, TextDocument, and file readers (CSV, JSON, JSONL, Text)
- Add get_property utility for nested dictionary access with dot notation
- Include hashing utility for document ID generation
- Update all imports throughout codebase to use graphrag_input
- Add package to workspace configuration and release tasks
- Remove old graphrag.index.input module

* Rename ChunkResult to TextChunk and add transformer support

- Rename chunk_result.py to text_chunk.py with ChunkResult -> TextChunk
- Add 'original' field to TextChunk to track pre-transform text
- Add optional transform callback to chunker.chunk() method
- Add add_metadata transformer for prepending metadata to chunks
- Update create_chunk_results to apply transforms and populate original
- Update sentence_chunker and token_chunker with transform support
- Refactor create_base_text_units to use new transformer pattern
- Rename pluck_metadata to get/collect methods on TextDocument

* Back-compat comment

* Align input config type name with other factory configs

* Add MarkItDown support

* Remove pattern default from MarkItDown reader

* Remove plugins flag (implicit disabled)

* Format

* Update verb tests

* Separate storage from input config

* Add empty objects for NaN raw_data

* Fix smoke tests

* Fix BOM in csv smoke

* Format
2026-01-12 12:47:57 -08:00
..
data Input factory (#2168) 2026-01-12 12:47:57 -08:00
__init__.py Verb merge nre1 (#1140) 2024-09-16 12:10:29 -07:00
test_create_base_text_units.py Input factory (#2168) 2026-01-12 12:47:57 -08:00
test_create_communities.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_create_community_reports.py Input factory (#2168) 2026-01-12 12:47:57 -08:00
test_create_final_documents.py Input factory (#2168) 2026-01-12 12:47:57 -08:00
test_create_final_text_units.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_extract_covariates.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_extract_graph_nlp.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_extract_graph.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_finalize_graph.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_generate_text_embeddings.py Remove embeddings optional new (#2128) 2025-11-17 13:10:54 -06:00
test_pipeline_state.py Graphrag config (#2119) 2025-11-10 07:57:03 -08:00
test_prune_graph.py Input factory (#2168) 2026-01-12 12:47:57 -08:00
util.py Input factory (#2168) 2026-01-12 12:47:57 -08:00