From 1081bfea080245c700d5f1692c88fe6c82ad01b8 Mon Sep 17 00:00:00 2001 From: Nathan Evans Date: Tue, 9 Sep 2025 17:39:07 -0700 Subject: [PATCH] Add consumption warning and switch to "christmas" for folder name --- docs/get_started.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/get_started.md b/docs/get_started.md index 895e8941..e5d1621c 100644 --- a/docs/get_started.md +++ b/docs/get_started.md @@ -1,5 +1,7 @@ # Getting Started +⚠️ GraphRAG can consume a lot of LLM resources! We strongly recommend starting with the tutorial dataset here until you understand how the system works, and consider experimenting with fast/inexpensive models first before committing to a big indexing job. + ## Requirements [Python 3.10-3.12](https://www.python.org/downloads/) @@ -24,25 +26,25 @@ pip install graphrag We need to set up a data project and some initial configuration. First let's get a sample dataset ready: ```sh -mkdir -p ./ragtest/input +mkdir -p ./christmas/input ``` Get a copy of A Christmas Carol by Charles Dickens from a trusted source: ```sh -curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragtest/input/book.txt +curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./christmas/input/book.txt ``` ## Set Up Your Workspace Variables To initialize your workspace, first run the `graphrag init` command. -Since we have already configured a directory named `./ragtest` in the previous step, run the following command: +Since we have already configured a directory named `./christmas` in the previous step, run the following command: ```sh -graphrag init --root ./ragtest +graphrag init --root ./christmas ``` -This will create two files: `.env` and `settings.yaml` in the `./ragtest` directory. +This will create two files: `.env` and `settings.yaml` in the `./christmas` directory. - `.env` contains the environment variables required to run the GraphRAG pipeline. If you inspect the file, you'll see a single environment variable defined, `GRAPHRAG_API_KEY=`. Replace `` with your own OpenAI or Azure API key. @@ -79,13 +81,13 @@ You will also need to login with [az login](https://learn.microsoft.com/en-us/cl Finally we'll run the pipeline! ```sh -graphrag index --root ./ragtest +graphrag index --root ./christmas ``` ![pipeline executing from the CLI](img/pipeline-running.png) This process will take some time to run. This depends on the size of your input data, what model you're using, and the text chunk size being used (these can be configured in your `settings.yaml` file). -Once the pipeline is complete, you should see a new folder called `./ragtest/output` with a series of parquet files. +Once the pipeline is complete, you should see a new folder called `./christmas/output` with a series of parquet files. # Using the Query Engine @@ -95,7 +97,7 @@ Here is an example using Global search to ask a high-level question: ```sh graphrag query \ ---root ./ragtest \ +--root ./christmas \ --method global \ --query "What are the top themes in this story?" ``` @@ -104,7 +106,7 @@ Here is an example using Local search to ask a more specific question about a pa ```sh graphrag query \ ---root ./ragtest \ +--root ./christmas \ --method local \ --query "Who is Scrooge and what are his main relationships?" ```