To start using GraphRAG, you need to configure the system. The init command is the easiest way to get started. It will create a .env and settings.yaml files in the specified directory with the necessary configuration settings. It will also output the default LLM prompts used by GraphRAG.
+
Usage
+
+
+
python -m graphrag.index [--init][--root PATH]
+
+
+
+
Options
+
+
--init - Initialize the directory with the necessary configuration files.
+
--root PATH - The root directory to initialize. Default is the current directory.
+
+
Example
+
+
+
python -m graphrag.index --init--root ./ragtest
+
+
+
+
Output
+
The init command will create the following files in the specified directory:
+
+
settings.yaml - The configuration settings file. This file contains the configuration settings for GraphRAG.
+
.env - The environment variables file. These are referenced in the settings.yaml file.
+
prompts/ - The LLM prompts folder. This contains the default prompts used by GraphRAG, you can modify them or run the Auto Prompt Tuning command to generate new prompts adapted to your data.
+
+
Next Steps
+
After initializing your workspace, you can either run the Prompt Tuning command to adapt the prompts to your data or even start running the Indexing Pipeline to index your data. For more information on configuring GraphRAG, see the Configuration documentation.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/posts/config/json_yaml/index.html b/posts/config/json_yaml/index.html
index 0f85719e..4df671b9 100644
--- a/posts/config/json_yaml/index.html
+++ b/posts/config/json_yaml/index.html
@@ -217,6 +217,9 @@ a {
The default configuration mode is the simplest way to get started with the GraphRAG system. It is designed to work out-of-the-box with minimal configuration. The primary configuration sections for the Indexing Engine pipelines are described below. The main ways to set up GraphRAG in Default Configuration mode are via:
Now we need to set up a data project and some initial configuration. Let's set that up. We're using the default configuration mode, which you can customize as needed using a config file, which we recommend, or environment variables.
First let's get a sample dataset ready:
@@ -327,51 +331,52 @@ It shows how to use the system to index some text, and then use the indexed data
Next we'll inject some required config variables:
-
Set Up Environment Variables
-
First let's make sure to setup the required environment variables. For details on these environment variables, and what environment variables are available, see the environment variables documentation.
+
Set Up Your Workspace Variables
+
First let's make sure to setup the required environment variables. For details on these environment variables, and what environment variables are available, see the variables documentation.
+
To initialize your workspace, let's first run the graphrag.index --init command.
+Since we have already configured a directory named .ragtest` in the previous step, we can run the following command:
+
+
+
python -m graphrag.index --init--root ./ragtest
+
+
+
+
This will create two files: .env and settings.yaml in the ./ragtest directory.
+
+
.env contains the environment variables required to run the GraphRAG pipeline. If you inspect the file, you'll see a single environment variable defined,
+GRAPHRAG_API_KEY=<API_KEY>. This is the API key for the OpenAI API or Azure OpenAI endpoint. You can replace this with your own API key.
+
settings.yaml contains the settings for the pipeline. You can modify this file to change the settings for the pipeline.
+
+
OpenAI and Azure OpenAI
-
Let's set the base environment variables.
-
-
-
exportGRAPHRAG_API_KEY="<api_key>"&&\
-exportGRAPHRAG_INPUT_TYPE="text"
-
-# Recommended, but not required.
-# JSON output mode is only available with some completion models.
-# export GRAPHRAG_LLM_MODEL_SUPPORTS_JSON="True"
-
-# You may use these env vars to specify which model to use.
-# export GRAPHRAG_LLM_MODEL="<chat_completions_model>"
-# export GRAPHRAG_EMBEDDING_MODEL="<embeddings_model>"
-
-
-
+
To run in OpenAI mode, just make sure to update the value of GRAPHRAG_API_KEY in the .env file with your OpenAI API key.
Azure OpenAI
-
In addition, Azure OpenAI users should set the following env-vars.
+
In addition, Azure OpenAI users should set the following variables in the settings.yaml file. To find the appropriate sections, just search for the llm: configuration, you should see two sections, one for the chat endpoint and one for the embeddings endpoint. Here is an example of how to configure the chat endpoint:
type: azure_openai_chat # Or azure_openai_embedding for embeddings
+api_base: https://<instance>.openai.azure.com
+api_version: 2024-02-15-preview # You can customize this for other versions
+deployment_name: <azure_model_deployment_name>
GraphRAG provides the ability to create domain adaptive templates for the generation of the knowledge graph. This step is optional, though is is highly encouraged to run it as it will yield better results when executing an Index Run.
The templates are generated by loading the inputs, splitting them into chunks (text units) and then running a series of LLM invocations and template substitutions to generate the final prompts. We suggest using the default values provided by the script, but in this page you'll find the detail of each in case you want to further explore and tweak the template generation algorithm.
+
Prerequisites
+
Before running the automatic template generation make sure you have already initialized your workspace with the graphrag.index --init command. This will create the necessary configuration files and the default prompts. Refer to the Init Documentation for more information about the initialization process.
Usage
You can run the main script from the command line with various options: