diff --git a/data/operation_dulce/dataset.zip b/data/operation_dulce/dataset.zip index f989b1ec..9e406d91 100644 Binary files a/data/operation_dulce/dataset.zip and b/data/operation_dulce/dataset.zip differ diff --git a/posts/config/env_vars/index.html b/posts/config/env_vars/index.html index 254f3fa5..7d0730ba 100644 --- a/posts/config/env_vars/index.html +++ b/posts/config/env_vars/index.html @@ -288,7 +288,7 @@ a {
By default, the GraphRAG indexer will only emit embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be generated by setting the GRAPHRAG_EMBEDDING_TARGET environment variable to all.
If the embedding target is all, and you want to only embed a subset of these fields, you may specify which embeddings to skip using the GRAPHRAG_EMBEDDING_SKIP argument described below.
If the embedding target is all, and you want to only embed a subset of these fields, you may specify which embeddings to skip using the GRAPHRAG_EMBEDDING_SKIP argument described below.
text_unit.textGRAPHRAG_EMBEDDING_RPMintGRAPHRAG_INPUT_TYPEfile or blob)file or blob)strfilestr.*\.csv$.*\.txt$GRAPHRAG_INPUT_SOURCE_COLUMNGRAPHRAG_INPUT_STORAGE_ACCOUNT_BLOB_URLblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netstrNonecsv or textstrcsvtextGRAPHRAG_INPUT_ENCODINGGRAPHRAG_STORAGE_STORAGE_ACCOUNT_BLOB_URLblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netstrGRAPHRAG_CACHE_STORAGE_ACCOUNT_BLOB_URLblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netstrGRAPHRAG_REPORTING_STORAGE_ACCOUNT_BLOB_URLblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netblob mode and using managed identity. Will have the format https://<storage_account_name>.blob.core.windows.netstr--root parameter on your Indexing Pipeline execution.
# Required LLM Config
# Input Data Configuration
-GRAPHRAG_INPUT_TYPE=text
+GRAPHRAG_INPUT_TYPE="file"
# Plaintext Input Data Configuration
# GRAPHRAG_INPUT_FILE_PATTERN=.*\.txt
-# CSV Input Data Configuration
-GRAPHRAG_INPUT_FILE_TYPE="csv"
-GRAPHRAG_INPUT_FILE_PATTERN=".*\.csv$"
+# Text Input Data Configuration
+GRAPHRAG_INPUT_FILE_TYPE="text"
+GRAPHRAG_INPUT_FILE_PATTERN=".*\.txt$"
GRAPHRAG_INPUT_SOURCE_COLUMN=source
# GRAPHRAG_INPUT_TIMESTAMP_COLUMN=None
# GRAPHRAG_INPUT_TIMESTAMP_FORMAT=None