From 2e064dbd09f5418b4a889fdfcd2c6db9dde4638a Mon Sep 17 00:00:00 2001 From: darthtrevino Date: Mon, 22 Apr 2024 20:47:43 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20microsof?= =?UTF-8?q?t/graphrag@91ad9ece4e97ec3a8df46418218fd4dd6eed10a4=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- posts/config/env_vars/index.html | 720 +++++++++++++++---------------- posts/get_started/index.html | 2 +- 2 files changed, 361 insertions(+), 361 deletions(-) diff --git a/posts/config/env_vars/index.html b/posts/config/env_vars/index.html index 5c144b61..accd7b9e 100644 --- a/posts/config/env_vars/index.html +++ b/posts/config/env_vars/index.html @@ -286,6 +286,366 @@ a {

Input Data

Our pipeline can ingest .csv or .txt data from an input folder. These files can be nested within subfolders. To configure how input data is handled, what fields are mapped over, and how timestamps are parsed, look for configuration values starting with GRAPHRAG_INPUT_ below. In general, CSV-based data provides the most customizeability. Each CSV should at least contain a text field (which can be mapped with environment variables), but it's helpful if they also have title, timestamp, and source fields. Additional fields can be included as well, which will land as extra fields on the Document table.

+

Base LLM Settings

+

These are the primary settings for configuring LLM connectivity.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterRequired?DescriptionTypeDefault Value
GRAPHRAG_API_KEYYesThe API key. (Note: `OPENAI_API_KEY is also used as a fallback)strNone
GRAPHRAG_API_BASEFor AOAIThe API Base URLstrNone
GRAPHRAG_API_VERSIONFor AOAIThe AOAI API version.strNone
GRAPHRAG_API_ORGANIZATIONThe AOAI organization.strNone
GRAPHRAG_API_PROXYThe AOAI proxy.strNone
+

Text Generation Settings

+

These settings control the text generation model used by the pipeline. Any settings with a fallback will use the base LLM settings, if available.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterRequired?DescriptionTypeDefault Value
GRAPHRAG_LLM_TYPEFor AOAIThe LLM operation type. Either openai_chat or azure_openai_chatstropenai_chat
GRAPHRAG_LLM_DEPLOYMENT_NAMEFor AOAIThe AOAI model deployment name.strNone
GRAPHRAG_LLM_API_KEYYes (uses fallback)The API key.strNone
GRAPHRAG_LLM_API_BASEFor AOAI (uses fallback)The API Base URLstrNone
GRAPHRAG_LLM_API_VERSIONFor AOAI (uses fallback)The AOAI API version.strNone
GRAPHRAG_LLM_API_ORGANIZATIONFor AOAI (uses fallback)The AOAI organization.strNone
GRAPHRAG_LLM_API_PROXYThe AOAI proxy.strNone
GRAPHRAG_LLM_MODELThe LLM model.strgpt-4-turbo-preview
GRAPHRAG_LLM_MAX_TOKENSThe maximum number of tokens.int4000
GRAPHRAG_LLM_REQUEST_TIMEOUTThe maximum number of seconds to wait for a response from the chat client.int180
GRAPHRAG_LLM_MODEL_SUPPORTS_JSONIndicates whether the given model supports JSON output mode. True to enable.strNone
GRAPHRAG_LLM_THREAD_COUNTThe number of threads to use for LLM parallelization.int50
GRAPHRAG_LLM_THREAD_STAGGERThe time to wait (in seconds) between starting each thread.float0.3
GRAPHRAG_LLM_CONCURRENT_REQUESTSThe number of concurrent requests to allow for the embedding client.int25
GRAPHRAG_LLM_TPMThe number of tokens per minute to allow for the LLM client. 0 = Bypassint0
GRAPHRAG_LLM_RPMThe number of requests per minute to allow for the LLM client. 0 = Bypassint0
GRAPHRAG_LLM_MAX_RETRIESThe maximum number of retries to attempt when a request fails.int10
GRAPHRAG_LLM_MAX_RETRY_WAITThe maximum number of seconds to wait between retries.int10
GRAPHRAG_LLM_SLEEP_ON_RATE_LIMIT_RECOMMENDATIONWhether to sleep on rate limit recommendation. (Azure Only)boolTrue
+

Text Embedding Settings

+

These settings control the text embedding model used by the pipeline. Any settings with a fallback will use the base LLM settings, if available.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterRequired ?DescriptionTypeDefault
GRAPHRAG_EMBEDDING_TYPEFor AOAIThe embedding client to use. Either openai_embedding or azure_openai_embeddingstropenai_embedding
GRAPHRAG_EMBEDDING_DEPLOYMENT_NAMEFor AOAIThe AOAI deployment name.strNone
GRAPHRAG_EMBEDDING_API_KEYYes (uses fallback)The API key to use for the embedding client.strNone
GRAPHRAG_EMBEDDING_API_BASEFor AOAI (uses fallback)The API base URL.strNone
GRAPHRAG_EMBEDDING_API_VERSIONFor AOAI (uses fallback)The AOAI API version to use for the embedding client.strNone
GRAPHRAG_EMBEDDING_API_ORGANIZATIONFor AOAI (uses fallback)The AOAI organization to use for the embedding client.strNone
GRAPHRAG_EMBEDDING_API_PROXYThe AOAI proxy to use for the embedding client.strNone
GRAPHRAG_EMBEDDING_MODELThe model to use for the embedding client.strtext-embedding-3-small
GRAPHRAG_EMBEDDING_BATCH_SIZEThe number of texts to embed at once. (Azure limit is 16)int16
GRAPHRAG_EMBEDDING_BATCH_MAX_TOKENSThe maximum tokens per batch (Azure limit is 8191)int8191
GRAPHRAG_EMBEDDING_TARGETThe target fields to embed. Either required or all.strrequired
GRAPHRAG_EMBEDDING_SKIPA comma-separated list of fields to skip embeddings for . (e.g. 'relationship.description')strNone
GRAPHRAG_EMBEDDING_THREAD_COUNTThe number of threads to use for parallelization for embeddings.int
GRAPHRAG_EMBEDDING_THREAD_STAGGERThe time to wait (in seconds) between starting each thread for embeddings.float50
GRAPHRAG_EMBEDDING_CONCURRENT_REQUESTSThe number of concurrent requests to allow for the embedding client.int25
GRAPHRAG_EMBEDDING_TPMThe number of tokens per minute to allow for the embedding client. 0 = Bypassint0
GRAPHRAG_EMBEDDING_RPMThe number of requests per minute to allow for the embedding client. 0 = Bypassint0
GRAPHRAG_EMBEDDING_MAX_RETRIESThe maximum number of retries to attempt when a request fails.int10
GRAPHRAG_EMBEDDING_MAX_RETRY_WAITThe maximum number of seconds to wait between retries.int10
GRAPHRAG_EMBEDDING_TARGETThe target fields to embed. Either required or all.strrequired
GRAPHRAG_EMBEDDING_SLEEP_ON_RATE_LIMIT_RECOMMENDATIONWhether to sleep on rate limit recommendation. (Azure Only)boolTrue

Plaintext Input Data (GRAPHRAG_INPUT_TYPE=text)

@@ -398,366 +758,6 @@ a {
-

Base LLM Settings

-

These settings control the base LLM arguments used by the pipeline. This is useful for API connection parameters.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterDescriptionTypeRequired or OptionalDefault Value
GRAPHRAG_API_KEYThe API key. (Note: `OPENAI_API_KEY is also used as a fallback)strrequiredNone
GRAPHRAG_API_BASEThe API Base URLstrrequired for AOAINone
GRAPHRAG_API_VERSIONThe AOAI API version.strrequired for AOAINone
GRAPHRAG_API_ORGANIZATIONThe AOAI organization.stroptional for AOAINone
GRAPHRAG_API_PROXYThe AOAI proxy.stroptional for AOAINone
-

Text Generation Settings

-

These settings control the text generation model used by the pipeline. These settings are layered on top of the base LLM settings, overriding any settings underneath.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterDescriptionTypeRequired or OptionalDefault Value
GRAPHRAG_LLM_TYPEThe LLM operation type. Either openai_chat or azure_openai_chatstroptionalopenai_chat
GRAPHRAG_LLM_API_KEYThe API key.strrequiredNone
GRAPHRAG_LLM_API_BASEThe API Base URLstrrequired for AOAINone
GRAPHRAG_LLM_API_VERSIONThe AOAI API version.strrequired for AOAINone
GRAPHRAG_LLM_API_ORGANIZATIONThe AOAI organization.stroptional for AOAINone
GRAPHRAG_LLM_API_PROXYThe AOAI proxy.stroptional for AOAINone
GRAPHRAG_LLM_DEPLOYMENT_NAMEThe AOAI deployment name.stroptional for AOAINone
GRAPHRAG_LLM_MODELThe model.stroptionalgpt-4-turbo-preview
GRAPHRAG_LLM_MAX_TOKENSThe maximum number of tokens.intoptional4000
GRAPHRAG_LLM_REQUEST_TIMEOUTThe maximum number of seconds to wait for a response from the chat client.intoptional180
GRAPHRAG_LLM_MODEL_SUPPORTS_JSONIndicates whether the given model supports JSON output mode. True to enable.stroptionalNone
GRAPHRAG_LLM_THREAD_COUNTThe number of threads to use for LLM parallelization.intoptional50
GRAPHRAG_LLM_THREAD_STAGGERThe time to wait (in seconds) between starting each thread.floatoptional0.3
GRAPHRAG_LLM_CONCURRENT_REQUESTSThe number of concurrent requests to allow for the embedding client.intoptional25
GRAPHRAG_LLM_TPMThe number of tokens per minute to allow for the LLM client. 0 = Bypassintoptional0
GRAPHRAG_LLM_RPMThe number of requests per minute to allow for the LLM client. 0 = Bypassintoptional0
GRAPHRAG_LLM_MAX_RETRIESThe maximum number of retries to attempt when a request fails.intoptional10
GRAPHRAG_LLM_MAX_RETRY_WAITThe maximum number of seconds to wait between retries.intoptional10
GRAPHRAG_LLM_SLEEP_ON_RATE_LIMIT_RECOMMENDATIONWhether to sleep on rate limit recommendation. (Azure Only)booloptionalTrue
-

Text Embedding Settings

-

These settings control the text embedding model used by the pipeline. These settings are layered on top of the base LLM settings, overriding any settings underneath.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterDescriptionTypeRequired or OptionalDefault
GRAPHRAG_EMBEDDING_TYPEThe embedding client to use. Either openai_embedding or azure_openai_embeddingstroptionalopenai_embedding
GRAPHRAG_EMBEDDING_API_KEYThe API key to use for the embedding client.strrequiredNone
GRAPHRAG_EMBEDDING_API_BASEThe API base URL.strrequired for AOAINone
GRAPHRAG_EMBEDDING_API_VERSIONThe AOAI API version to use for the embedding client.strrequired for AOAINone
GRAPHRAG_EMBEDDING_API_ORGANIZATIONThe AOAI organization to use for the embedding client.stroptional for AOAINone
GRAPHRAG_EMBEDDING_API_PROXYThe AOAI proxy to use for the embedding client.stroptional for AOAINone
GRAPHRAG_EMBEDDING_DEPLOYMENT_NAMEThe AOAI deployment name.stroptional for AOAINone
GRAPHRAG_EMBEDDING_MODELThe model to use for the embedding client.stroptionaltext-embedding-3-small
GRAPHRAG_EMBEDDING_BATCH_SIZEThe number of texts to embed at once. (Azure limit is 16)intoptional16
GRAPHRAG_EMBEDDING_BATCH_MAX_TOKENSThe maximum tokens per batch (Azure limit is 8191)intoptional8191
GRAPHRAG_EMBEDDING_TARGETThe target fields to embed. Either required or all.stroptionalrequired
GRAPHRAG_EMBEDDING_SKIPA comma-separated list of fields to skip embeddings for . (e.g. 'relationship.description')stroptionalNone
GRAPHRAG_EMBEDDING_THREAD_COUNTThe number of threads to use for parallelization for embeddings.intoptional
GRAPHRAG_EMBEDDING_THREAD_STAGGERThe time to wait (in seconds) between starting each thread for embeddings.floatoptional50
GRAPHRAG_EMBEDDING_CONCURRENT_REQUESTSThe number of concurrent requests to allow for the embedding client.intoptional25
GRAPHRAG_EMBEDDING_TPMThe number of tokens per minute to allow for the embedding client. 0 = Bypassintoptional0
GRAPHRAG_EMBEDDING_RPMThe number of requests per minute to allow for the embedding client. 0 = Bypassintoptional0
GRAPHRAG_EMBEDDING_MAX_RETRIESThe maximum number of retries to attempt when a request fails.intoptional10
GRAPHRAG_EMBEDDING_MAX_RETRY_WAITThe maximum number of seconds to wait between retries.intoptional10
GRAPHRAG_EMBEDDING_SLEEP_ON_RATE_LIMIT_RECOMMENDATIONWhether to sleep on rate limit recommendation. (Azure Only)booloptionalTrue
GRAPHRAG_EMBEDDING_TARGETThe target fields to embed. Either required or all.stroptionalrequired

Data Mapping Settings

diff --git a/posts/get_started/index.html b/posts/get_started/index.html index 4af8db66..526c7b28 100644 --- a/posts/get_started/index.html +++ b/posts/get_started/index.html @@ -376,7 +376,7 @@ Once the pipeline is complete, you should see a new folder called ./ragtes
python -m graphrag.query \
---data ./ragtest \
+--root ./ragtest \
 --method local \
 "Who is Scrooge, and what are his main relationships?"