diff --git a/img/auto-tune-diagram.png b/img/auto-tune-diagram.png new file mode 100644 index 00000000..acbd42eb Binary files /dev/null and b/img/auto-tune-diagram.png differ diff --git a/index.html b/index.html index 7e886d7b..21afaf95 100644 --- a/index.html +++ b/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/custom/index.html b/posts/config/custom/index.html index 65a8613b..eab228ab 100644 --- a/posts/config/custom/index.html +++ b/posts/config/custom/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/env_vars/index.html b/posts/config/env_vars/index.html index c39a52bd..a08dabe4 100644 --- a/posts/config/env_vars/index.html +++ b/posts/config/env_vars/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/init/index.html b/posts/config/init/index.html index 4fc75745..5b00193a 100644 --- a/posts/config/init/index.html +++ b/posts/config/init/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/json_yaml/index.html b/posts/config/json_yaml/index.html index 6ab3dfcb..ef5bae96 100644 --- a/posts/config/json_yaml/index.html +++ b/posts/config/json_yaml/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/overview/index.html b/posts/config/overview/index.html index d3e5d3e8..0b4755d6 100644 --- a/posts/config/overview/index.html +++ b/posts/config/overview/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/config/template/index.html b/posts/config/template/index.html index 740c30b2..31bd0912 100644 --- a/posts/config/template/index.html +++ b/posts/config/template/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/developing/index.html b/posts/developing/index.html index 006b7808..3b3556fa 100644 --- a/posts/developing/index.html +++ b/posts/developing/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/get_started/index.html b/posts/get_started/index.html index 71ac39a3..bd056bcf 100644 --- a/posts/get_started/index.html +++ b/posts/get_started/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/index/0-architecture/index.html b/posts/index/0-architecture/index.html index 7d324844..5c526560 100644 --- a/posts/index/0-architecture/index.html +++ b/posts/index/0-architecture/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/index/1-default_dataflow/index.html b/posts/index/1-default_dataflow/index.html index 252b7939..84bc20d8 100644 --- a/posts/index/1-default_dataflow/index.html +++ b/posts/index/1-default_dataflow/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/index/2-cli/index.html b/posts/index/2-cli/index.html index 97bb1d3e..84e1e05b 100644 --- a/posts/index/2-cli/index.html +++ b/posts/index/2-cli/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/index/overview/index.html b/posts/index/overview/index.html index 56e7a483..11f06a45 100644 --- a/posts/index/overview/index.html +++ b/posts/index/overview/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/prompt_tuning/auto_prompt_tuning/index.html b/posts/prompt_tuning/auto_prompt_tuning/index.html index c0d16895..0b3db972 100644 --- a/posts/prompt_tuning/auto_prompt_tuning/index.html +++ b/posts/prompt_tuning/auto_prompt_tuning/index.html @@ -241,12 +241,12 @@ a { @@ -289,17 +289,23 @@ a {

Prompt Tuning ⚙️

-

GraphRAG provides the ability to create domain adaptive templates for the generation of the knowledge graph. This step is optional, though it is highly encouraged to run it as it will yield better results when executing an Index Run.

-

The templates are generated by loading the inputs, splitting them into chunks (text units) and then running a series of LLM invocations and template substitutions to generate the final prompts. We suggest using the default values provided by the script, but in this page you'll find the detail of each in case you want to further explore and tweak the template generation algorithm.

+

GraphRAG provides the ability to create domain adapted prompts for the generation of the knowledge graph. This step is optional, though it is highly encouraged to run it as it will yield better results when executing an Index Run.

+

These are generated by loading the inputs, splitting them into chunks (text units) and then running a series of LLM invocations and template substitutions to generate the final prompts. We suggest using the default values provided by the script, but in this page you'll find the detail of each in case you want to further explore and tweak the prompt tuning algorithm.

+

+Figure 1: Auto Tuning Conceptual Diagram. +

+

+Figure 1: Auto Tuning Conceptual Diagram. +

Prerequisites

-

Before running the automatic template generation make sure you have already initialized your workspace with the graphrag.index --init command. This will create the necessary configuration files and the default prompts. Refer to the Init Documentation for more information about the initialization process.

+

Before running auto tuning make sure you have already initialized your workspace with the graphrag.index --init command. This will create the necessary configuration files and the default prompts. Refer to the Init Documentation for more information about the initialization process.

Usage

You can run the main script from the command line with various options:

-
python -m graphrag.prompt_tune [--root ROOT] [--domain DOMAIN]  [--method METHOD] [--limit LIMIT] [--language LANGUAGE] [--max-tokens MAX_TOKENS] [--chunk-size CHUNK_SIZE] [--no-entity-types] [--output OUTPUT]
+
python -m graphrag.prompt_tune [--root ROOT] [--domain DOMAIN]  [--method METHOD] [--limit LIMIT] [--language LANGUAGE] [--max-tokens MAX_TOKENS] [--chunk-size CHUNK_SIZE] [--no-entity-types] [--output OUTPUT]
-
@@ -315,7 +321,7 @@ a {

--domain (optional): The domain related to your input data, such as 'space science', 'microbiology', or 'environmental news'. If left empty, the domain will be inferred from the input data.

  • -

    --method (optional): The method to select documents. Options are all, random, or top. Default is random.

    +

    --method (optional): The method to select documents. Options are all, random, auto or top. Default is random.

  • --limit (optional): The limit of text units to load when using random or top selection. Default is 15.

    @@ -330,6 +336,15 @@ a {

    --chunk-size (optional): The size in tokens to use for generating text units from input documents. Default is 200.

  • +

    --n-subset-max (optional): The number of text chunks to embed when using auto selection method. Default is 300.

    +
  • +
  • +

    --k (optional): The number of documents to select when using auto selection method. Default is 15.

    +
  • +
  • +

    --min-examples-required (optional): The minimum number of examples required for entity extraction prompts. Default is 2.

    +
  • +
  • --no-entity-types (optional): Use untyped entity extraction generation. We recommend using this when your data covers a lot of topics or it is highly randomized.

  • @@ -339,31 +354,32 @@ a {

    Example Usage

    -
    python -m graphrag.prompt_tune --root /path/to/project --config /path/to/settings.yaml --domain "environmental news" --method random --limit 10 --language English --max-tokens 2048 --chunk-size 256 --no-entity-types --output /path/to/output
    +
    python -m graphrag.prompt_tune --root /path/to/project --config /path/to/settings.yaml --domain "environmental news" --method random --limit 10 --language English --max-tokens 2048 --chunk-size 256 --min-examples-required 3 --no-entity-types --output /path/to/output
    -

    or, with minimal configuration (suggested):

    -
    python -m graphrag.prompt_tune --root /path/to/project --config /path/to/settings.yaml --no-entity-types
    +
    python -m graphrag.prompt_tune --root /path/to/project --config /path/to/settings.yaml --no-entity-types
    -

    Document Selection Methods

    -

    The auto template feature ingests the input data and then divides it into text units the size of the chunk size parameter. -After that, it uses one of the following selection methods to pick a sample to work with for template generation:

    +

    The auto tuning feature ingests the input data and then divides it into text units the size of the chunk size parameter. +After that, it uses one of the following selection methods to pick a sample to work with for prompt generation:

    Modify Env Vars

    -

    After running auto-templating, you should modify the following environment variables (or config variables) to pick up the new prompts on your index run. Note: Please make sure to update the correct path to the generated prompts, in this example we are using the default "prompts" path.

    +

    After running auto tuning, you should modify the following environment variables (or config variables) to pick up the new prompts on your index run. Note: Please make sure to update the correct path to the generated prompts, in this example we are using the default "prompts" path.

    +

    or in your yaml config file:

    + +
    +
    entity_extraction:
    +  prompt: "prompts/entity_extraction.txt"
    +
    +summarize_descriptions:
    +  prompt: "prompts/summarize_descriptions.txt"
    +
    +community_reports:
    +  prompt: "prompts/community_report.txt"
    + + +
  • diff --git a/posts/prompt_tuning/manual_prompt_tuning/index.html b/posts/prompt_tuning/manual_prompt_tuning/index.html index 27bcf899..2078c974 100644 --- a/posts/prompt_tuning/manual_prompt_tuning/index.html +++ b/posts/prompt_tuning/manual_prompt_tuning/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/prompt_tuning/overview/index.html b/posts/prompt_tuning/overview/index.html index 4944d3f4..752bbf1e 100644 --- a/posts/prompt_tuning/overview/index.html +++ b/posts/prompt_tuning/overview/index.html @@ -241,12 +241,12 @@ a { @@ -298,10 +298,10 @@ a {
  • Claim Extraction
  • Community Reports
  • -

    Auto Templating

    -

    Auto Templating leverages your input data and LLM interactions to create domain adaptive templates for the generation of the knowledge graph. It is highly encouraged to run it as it will yield better results when executing an Index Run. For more details about how to use it, please refer to the Auto Templating documentation.

    -

    Manual Configuration

    -

    Manual configuration is an advanced use-case. Most users will want to use the Auto Templating feature instead. Details about how to use manual configuration are available in the Manual Prompt Configuration documentation.

    +

    Auto Tuning

    +

    Auto Tuning leverages your input data and LLM interactions to create domain adapted prompts for the generation of the knowledge graph. It is highly encouraged to run it as it will yield better results when executing an Index Run. For more details about how to use it, please refer to the Auto Tuning documentation.

    +

    Manual Tuning

    +

    Manual tuning is an advanced use-case. Most users will want to use the Auto Tuning feature instead. Details about how to use manual configuration are available in the Manual Tuning documentation.

    diff --git a/posts/query/0-global_search/index.html b/posts/query/0-global_search/index.html index 533a3c4f..6c2ff826 100644 --- a/posts/query/0-global_search/index.html +++ b/posts/query/0-global_search/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/1-local_search/index.html b/posts/query/1-local_search/index.html index 06797109..6c9ffa63 100644 --- a/posts/query/1-local_search/index.html +++ b/posts/query/1-local_search/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/2-question_generation/index.html b/posts/query/2-question_generation/index.html index 245b9886..664f9108 100644 --- a/posts/query/2-question_generation/index.html +++ b/posts/query/2-question_generation/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/3-cli/index.html b/posts/query/3-cli/index.html index 742d511a..64914932 100644 --- a/posts/query/3-cli/index.html +++ b/posts/query/3-cli/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/notebooks/global_search_nb/index.html b/posts/query/notebooks/global_search_nb/index.html index d6144c53..0cacfb7b 100644 --- a/posts/query/notebooks/global_search_nb/index.html +++ b/posts/query/notebooks/global_search_nb/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/notebooks/local_search_nb/index.html b/posts/query/notebooks/local_search_nb/index.html index 970a03c0..7985624f 100644 --- a/posts/query/notebooks/local_search_nb/index.html +++ b/posts/query/notebooks/local_search_nb/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/notebooks/overview/index.html b/posts/query/notebooks/overview/index.html index 8258a6a7..add746c6 100644 --- a/posts/query/notebooks/overview/index.html +++ b/posts/query/notebooks/overview/index.html @@ -241,12 +241,12 @@ a { diff --git a/posts/query/overview/index.html b/posts/query/overview/index.html index f0069cdf..b4caf7b2 100644 --- a/posts/query/overview/index.html +++ b/posts/query/overview/index.html @@ -241,12 +241,12 @@ a {