From 10e2b593661bd8cf59e4682f14f8da73df5265b4 Mon Sep 17 00:00:00 2001 From: Nathan Evans Date: Tue, 9 Sep 2025 17:01:56 -0700 Subject: [PATCH] Add info on input documents DataFrame --- docs/index/inputs.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/index/inputs.md b/docs/index/inputs.md index 8d21c440..3e94222c 100644 --- a/docs/index/inputs.md +++ b/docs/index/inputs.md @@ -16,6 +16,10 @@ All input formats are loaded within GraphRAG and passed to the indexing pipeline Also see the [outputs](outputs.md) documentation for the final documents table schema saved to parquet after pipeline completion. +## Bring-your-own DataFrame + +As of version 3, GraphRAG's [indexing API method](https://github.com/microsoft/graphrag/blob/main/graphrag/api/index.py) allows you to pass in your own pandas DataFrame and bypass all of the input loading/parsing described in the next section. This is convenient if you have content in a format or storage location we don't support out-of-the-box. __You must ensure that your input DataFrame conforms to the schema described above.__ All of the chunking behavior described later will proceed exactly the same. + ## Formats We support three file formats out-of-the-box. This covers the overwhelming majority of use cases we have encountered. If you have a different format, we recommend writing a script to convert to one of these, which are widely used and supported by many tools and libraries.