Release v2.7.0 (#2087 )

Housekeeping (#2086 )
* Add deprecation warnings for fnllm and multi-search * Fix dangling token_encoder refs * Fix local_search notebook * Fix global search dynamic notebook * Fix global search notebook * Fix drift notebook * Switch example notebooks to use LiteLLM config * Properly annotate dev deps as a group * Semver * Remove --extra dev * Remove llm_model variable * Ignore ruff ASYNC240 * Add note about expected broken notebook in docs * Fix custom vector store notebook * Push tokenizer throughout
2026-01-14 00:57:23 +08:00 · 2025-10-08 21:33:34 -07:00 · 2025-10-07 16:21:24 -07:00 · 2025-10-06 12:06:41 -07:00 · 2025-10-06 10:54:21 -07:00 · 2025-09-23 14:48:28 -07:00
75 changed files with 836 additions and 972 deletions
--- a/.github/workflows/gh-pages.yml
+++ b/.github/workflows/gh-pages.yml
@ -15,8 +15,6 @@ jobs:
      GH_PAGES: 1
      DEBUG: 1
      GRAPHRAG_API_KEY: ${{ secrets.GRAPHRAG_API_KEY }}
-      GRAPHRAG_LLM_MODEL: ${{ secrets.GRAPHRAG_LLM_MODEL }}
-      GRAPHRAG_EMBEDDING_MODEL: ${{ secrets.GRAPHRAG_EMBEDDING_MODEL }}

    steps:
      - uses: actions/checkout@v4
@ -33,7 +31,7 @@ jobs:

      - name: Install dependencies
        shell: bash
-        run: uv sync --extra dev
+        run: uv sync
  
      - name: mkdocs build
        shell: bash
--- a/.github/workflows/python-ci.yml
+++ b/.github/workflows/python-ci.yml
@ -67,7 +67,7 @@ jobs:
      - name: Install dependencies
        shell: bash
        run: |
-          uv sync --extra dev
+          uv sync
          uv pip install gensim

      - name: Check
--- a/.github/workflows/python-integration-tests.yml
+++ b/.github/workflows/python-integration-tests.yml
@ -67,7 +67,7 @@ jobs:
      - name: Install dependencies
        shell: bash
        run: |
-          uv sync --extra dev
+          uv sync
          uv pip install gensim

      - name: Build
--- a/.github/workflows/python-notebook-tests.yml
+++ b/.github/workflows/python-notebook-tests.yml
@ -38,8 +38,6 @@ jobs:
    env:
      DEBUG: 1
      GRAPHRAG_API_KEY: ${{ secrets.OPENAI_NOTEBOOK_KEY }}
-      GRAPHRAG_LLM_MODEL: ${{ secrets.GRAPHRAG_LLM_MODEL }}
-      GRAPHRAG_EMBEDDING_MODEL: ${{ secrets.GRAPHRAG_EMBEDDING_MODEL }}

    runs-on: ${{ matrix.os }}
    steps:
@ -69,7 +67,7 @@ jobs:
      - name: Install dependencies
        shell: bash
        run: |
-          uv sync --extra dev
+          uv sync
          uv pip install gensim

      - name: Notebook Test
--- a/.github/workflows/python-smoke-tests.yml
+++ b/.github/workflows/python-smoke-tests.yml
@ -37,20 +37,8 @@ jobs:
      fail-fast: false # Continue running all jobs even if one fails
    env:
      DEBUG: 1
-      GRAPHRAG_LLM_TYPE: "azure_openai_chat"
-      GRAPHRAG_EMBEDDING_TYPE: "azure_openai_embedding"
      GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}
-      GRAPHRAG_API_VERSION: ${{ secrets.GRAPHRAG_API_VERSION }}
-      GRAPHRAG_LLM_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_LLM_DEPLOYMENT_NAME }}
-      GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME }}
-      GRAPHRAG_LLM_MODEL: ${{ secrets.GRAPHRAG_LLM_MODEL }}
-      GRAPHRAG_EMBEDDING_MODEL: ${{ secrets.GRAPHRAG_EMBEDDING_MODEL }}
-      # We have Windows + Linux runners in 3.10, so we need to divide the rate limits by 2
-      GRAPHRAG_LLM_TPM: 200_000 # 400_000 / 2
-      GRAPHRAG_LLM_RPM: 1_000 # 2_000 / 2
-      GRAPHRAG_EMBEDDING_TPM: 225_000 # 450_000 / 2
-      GRAPHRAG_EMBEDDING_RPM: 1_000 # 2_000 / 2
      # Azure AI Search config
      AZURE_AI_SEARCH_URL_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_URL_ENDPOINT }}
      AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
@ -84,7 +72,7 @@ jobs:
      - name: Install dependencies
        shell: bash
        run: |
-          uv sync --extra dev
+          uv sync
          uv pip install gensim

      - name: Build
--- a/.semversioner/2.7.0.json
+++ b/.semversioner/2.7.0.json
@ -0,0 +1,18 @@
+{
+  "changes": [
+    {
+      "description": "Set LiteLLM as default in init_content.",
+      "type": "minor"
+    },
+    {
+      "description": "Fix Azure auth scope issue with LiteLLM.",
+      "type": "patch"
+    },
+    {
+      "description": "Housekeeping toward 2.7.",
+      "type": "patch"
+    }
+  ],
+  "created_at": "2025-10-08T22:39:42+00:00",
+  "version": "2.7.0"
+}
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,6 +1,12 @@
 # Changelog
 Note: version releases in the 0.x.y range may introduce breaking changes.

+## 2.7.0
+
+- minor: Set LiteLLM as default in init_content.
+- patch: Fix Azure auth scope issue with LiteLLM.
+- patch: Housekeeping toward 2.7.
+
 ## 2.6.0

 - minor: Add LiteLLM chat and embedding model providers.
--- a/DEVELOPING.md
+++ b/DEVELOPING.md
@ -11,12 +11,8 @@

 ## Install Dependencies
 ```shell
-# (optional) create virtual environment
-uv venv --python 3.10
-source .venv/bin/activate
-
 # install python dependencies
-uv sync --extra dev
+uv sync
 ```

 ## Execute the indexing engine
@ -119,8 +115,3 @@ and then in your bashrc, add
 Make sure you have python3.10-dev installed or more generally `python<version>-dev`

 `sudo apt-get install python3.10-dev`
-
-### LLM call constantly exceeds TPM, RPM or time limits
-
-`GRAPHRAG_LLM_THREAD_COUNT` and `GRAPHRAG_EMBEDDING_THREAD_COUNT` are both set to 50 by default. You can modify these values
-to reduce concurrency. Please refer to the [Configuration Documents](https://microsoft.github.io/graphrag/config/overview/)
--- a/docs/config/models.md
+++ b/docs/config/models.md
@ -8,9 +8,38 @@ GraphRAG was built and tested using OpenAI models, so this is the default model

 GraphRAG also utilizes a language model wrapper library used by several projects within our team, called fnllm. fnllm provides two important functions for GraphRAG: rate limiting configuration to help us maximize throughput for large indexing jobs, and robust caching of API calls to minimize consumption on repeated indexes for testing, experimentation, or incremental ingest. fnllm uses the OpenAI Python SDK under the covers, so OpenAI-compliant endpoints are a base requirement out-of-the-box.

+Starting with version 2.6.0, GraphRAG supports using [LiteLLM](https://docs.litellm.ai/) instead of fnllm for calling language models. LiteLLM provides support for 100+ models though it is important to note that when choosing a model it must support returning [structured outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/) adhering to a [JSON schema](https://docs.litellm.ai/docs/completion/json_mode). 
+
+Example using LiteLLm as the language model tool for GraphRAG:
+
+```yaml
+models:
+  default_chat_model:
+    type: chat
+    auth_type: api_key
+    api_key: ${GEMINI_API_KEY}
+    model_provider: gemini
+    model: gemini-2.5-flash-lite
+  default_embedding_model:
+    type: embedding
+    auth_type: api_key
+    api_key: ${GEMINI_API_KEY}
+    model_provider: gemini
+    model: gemini-embedding-001
+```
+
+To use LiteLLM one must 
+
+- Set `type` to either `chat` or `embedding`.
+- Provide a `model_provider`, e.g., `openai`, `azure`, `gemini`, etc.
+- Set the `model` to a one supported by the `model_provider`'s API.
+- Provide a `deployment_name` if using `azure` as the `model_provider`.
+
+See [Detailed Configuration](yaml.md) for more details on configuration. [View LiteLLm basic usage](https://docs.litellm.ai/docs/#basic-usage) for details on how models are called (The `model_provider` is the portion prior to `/` while the `model` is the portion following the `/`).
+
 ## Model Selection Considerations

-GraphRAG has been most thoroughly tested with the gpt-4 series of models from OpenAI, including gpt-4 gpt-4-turbo, gpt-4o, and gpt-4o-mini. Our [arXiv paper](https://arxiv.org/abs/2404.16130), for example, performed quality evaluation using gpt-4-turbo.
+GraphRAG has been most thoroughly tested with the gpt-4 series of models from OpenAI, including gpt-4 gpt-4-turbo, gpt-4o, and gpt-4o-mini. Our [arXiv paper](https://arxiv.org/abs/2404.16130), for example, performed quality evaluation using gpt-4-turbo. As stated above, non-OpenAI models are now supported with GraphRAG 2.6.0 and onwards through the use of LiteLLM but the suite of gpt-4 series of models from OpenAI remain the most tested and supported suite of models for GraphRAG.

 Versions of GraphRAG before 2.2.0 made extensive use of `max_tokens` and `logit_bias` to control generated response length or content. The introduction of the o-series of models added new, non-compatible parameters because these models include a reasoning component that has different consumption patterns and response generation attributes than non-reasoning models. GraphRAG 2.2.0 now supports these models, but there are important differences that need to be understood before you switch.

@ -58,11 +87,11 @@ Another option would be to avoid using a language model at all for the graph ext

 ## Using Non-OpenAI Models

-As noted above, our primary experience and focus has been on OpenAI models, so this is what is supported out-of-the-box. Many users have requested support for additional model types, but it's out of the scope of our research to handle the many models available today. There are two approaches you can use to connect to a non-OpenAI model:
+As shown above, non-OpenAI models may be used via LiteLLM starting with GraphRAG version 2.6.0 but cases may still exist in which some users wish to use models not supported by LiteLLM. There are two approaches one can use to connect to unsupported models:

 ### Proxy APIs

-Many users have used platforms such as [ollama](https://ollama.com/) to proxy the underlying model HTTP calls to a different model provider. This seems to work reasonably well, but we frequently see issues with malformed responses (especially JSON), so if you do this please understand that your model needs to reliably return the specific response formats that GraphRAG expects. If you're having trouble with a model, you may need to try prompting to coax the format, or intercepting the response within your proxy to try and handle malformed responses.
+Many users have used platforms such as [ollama](https://ollama.com/) and [LiteLLM Proxy Server](https://docs.litellm.ai/docs/simple_proxy) to proxy the underlying model HTTP calls to a different model provider. This seems to work reasonably well, but we frequently see issues with malformed responses (especially JSON), so if you do this please understand that your model needs to reliably return the specific response formats that GraphRAG expects. If you're having trouble with a model, you may need to try prompting to coax the format, or intercepting the response within your proxy to try and handle malformed responses.

 ### Model Protocol

--- a/docs/config/yaml.md
+++ b/docs/config/yaml.md
@ -41,7 +41,8 @@ models:

 - `api_key` **str** - The OpenAI API key to use.
 - `auth_type` **api_key|azure_managed_identity** - Indicate how you want to authenticate requests.
- `type` **openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use.
+- `type` **chat**|**embedding**|**openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use.
+- `model_provider` **str|None** - The model provider to use, e.g., openai, azure, anthropic, etc. Required when `type == chat|embedding`. When `type == chat|embedding`, [LiteLLM](https://docs.litellm.ai/) is used under the hood which has support for calling 100+ models. [View LiteLLm basic usage](https://docs.litellm.ai/docs/#basic-usage) for details on how models are called (The `model_provider` is the portion prior to `/` while the `model` is the portion following the `/`). [View Language Model Selection](models.md) for more details and examples on using LiteLLM.
 - `model` **str** - The model name.
 - `encoding_model` **str** - The text encoding model to use. Default is to use the encoding model aligned with the language model (i.e., it is retrieved from tiktoken if unset).
 - `api_base` **str** - The API base url to use.
--- a/docs/developing.md
+++ b/docs/developing.md
@ -12,12 +12,8 @@
 ## Install Dependencies

 ```sh
-# (optional) create virtual environment
-uv venv --python 3.10
-source .venv/bin/activate
-
 # install python dependencies
-uv sync --extra dev
+uv sync
 ```

 ## Execute the Indexing Engine
@ -77,8 +73,3 @@ Make sure llvm-9 and llvm-9-dev are installed:
 and then in your bashrc, add

 `export LLVM_CONFIG=/usr/bin/llvm-config-9`
-
-### LLM call constantly exceeds TPM, RPM or time limits
-
-`GRAPHRAG_LLM_THREAD_COUNT` and `GRAPHRAG_EMBEDDING_THREAD_COUNT` are both set to 50 by default. You can modify these values
-to reduce concurrency. Please refer to the [Configuration Documents](config/overview.md)
--- a/docs/examples_notebooks/api_overview.ipynb
+++ b/docs/examples_notebooks/api_overview.ipynb
@ -67,6 +67,8 @@
   "metadata": {},
   "outputs": [],
   "source": [
+    "# note that we expect this to fail on the deployed docs because the PROJECT_DIRECTORY is not set to a real location.\n",
+    "# if you run this notebook locally, make sure to point at a location containing your settings.yaml\n",
    "graphrag_config = load_config(Path(PROJECT_DIRECTORY))"
   ]
  },
--- a/docs/examples_notebooks/custom_vector_store.ipynb
+++ b/docs/examples_notebooks/custom_vector_store.ipynb
@ -61,6 +61,7 @@
    "import numpy as np\n",
    "import yaml\n",
    "\n",
+    "from graphrag.config.models.vector_store_schema_config import VectorStoreSchemaConfig\n",
    "from graphrag.data_model.types import TextEmbedder\n",
    "\n",
    "# GraphRAG vector store components\n",
@ -147,14 +148,12 @@
    "        self.vectors: dict[str, np.ndarray] = {}\n",
    "        self.connected = False\n",
    "\n",
-    "        print(\n",
-    "            f\"🚀 SimpleInMemoryVectorStore initialized for collection: {self.collection_name}\"\n",
-    "        )\n",
+    "        print(f\"🚀 SimpleInMemoryVectorStore initialized for index: {self.index_name}\")\n",
    "\n",
    "    def connect(self, **kwargs: Any) -> None:\n",
    "        \"\"\"Connect to the vector storage (no-op for in-memory store).\"\"\"\n",
    "        self.connected = True\n",
-    "        print(f\"✅ Connected to in-memory vector store: {self.collection_name}\")\n",
+    "        print(f\"✅ Connected to in-memory vector store: {self.index_name}\")\n",
    "\n",
    "    def load_documents(\n",
    "        self, documents: list[VectorStoreDocument], overwrite: bool = True\n",
@ -250,7 +249,7 @@
    "    def get_stats(self) -> dict[str, Any]:\n",
    "        \"\"\"Get statistics about the vector store (custom method).\"\"\"\n",
    "        return {\n",
-    "            \"collection_name\": self.collection_name,\n",
+    "            \"index_name\": self.index_name,\n",
    "            \"document_count\": len(self.documents),\n",
    "            \"vector_count\": len(self.vectors),\n",
    "            \"connected\": self.connected,\n",
@ -353,11 +352,11 @@
   "outputs": [],
   "source": [
    "# Test creating vector store using the factory\n",
-    "vector_store_config = {\"collection_name\": \"test_collection\"}\n",
+    "schema = VectorStoreSchemaConfig(index_name=\"test_collection\")\n",
    "\n",
    "# Create vector store instance using factory\n",
    "vector_store = VectorStoreFactory.create_vector_store(\n",
-    "    CUSTOM_VECTOR_STORE_TYPE, vector_store_config\n",
+    "    CUSTOM_VECTOR_STORE_TYPE, vector_store_schema_config=schema\n",
    ")\n",
    "\n",
    "print(f\"✅ Created vector store instance: {type(vector_store).__name__}\")\n",
@ -486,9 +485,13 @@
    "    print(\"🚀 Simulating GraphRAG pipeline with custom vector store...\\n\")\n",
    "\n",
    "    # 1. GraphRAG creates vector store using factory\n",
-    "    config = {\"collection_name\": \"graphrag_entities\", \"similarity_threshold\": 0.3}\n",
+    "    schema = VectorStoreSchemaConfig(index_name=\"graphrag_entities\")\n",
    "\n",
-    "    store = VectorStoreFactory.create_vector_store(CUSTOM_VECTOR_STORE_TYPE, config)\n",
+    "    store = VectorStoreFactory.create_vector_store(\n",
+    "        CUSTOM_VECTOR_STORE_TYPE,\n",
+    "        vector_store_schema_config=schema,\n",
+    "        similarity_threshold=0.3,\n",
+    "    )\n",
    "    store.connect()\n",
    "\n",
    "    print(\"✅ Step 1: Vector store created and connected\")\n",
@ -549,7 +552,8 @@
    "    # Test 1: Basic functionality\n",
    "    print(\"Test 1: Basic functionality\")\n",
    "    store = VectorStoreFactory.create_vector_store(\n",
-    "        CUSTOM_VECTOR_STORE_TYPE, {\"collection_name\": \"test\"}\n",
+    "        CUSTOM_VECTOR_STORE_TYPE,\n",
+    "        vector_store_schema_config=VectorStoreSchemaConfig(index_name=\"test\"),\n",
    "    )\n",
    "    store.connect()\n",
    "\n",
@ -597,7 +601,8 @@
    "    # Test 5: Error handling\n",
    "    print(\"\\nTest 5: Error handling\")\n",
    "    disconnected_store = VectorStoreFactory.create_vector_store(\n",
-    "        CUSTOM_VECTOR_STORE_TYPE, {\"collection_name\": \"test2\"}\n",
+    "        CUSTOM_VECTOR_STORE_TYPE,\n",
+    "        vector_store_schema_config=VectorStoreSchemaConfig(index_name=\"test2\"),\n",
    "    )\n",
    "\n",
    "    try:\n",
@ -653,7 +658,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "graphrag-venv (3.10.18)",
+   "display_name": "graphrag",
   "language": "python",
   "name": "python3"
  },
@ -667,7 +672,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.18"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/docs/examples_notebooks/drift_search.ipynb
+++ b/docs/examples_notebooks/drift_search.ipynb
@ -20,11 +20,11 @@
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
-    "import tiktoken\n",
    "\n",
    "from graphrag.config.enums import ModelType\n",
    "from graphrag.config.models.drift_search_config import DRIFTSearchConfig\n",
    "from graphrag.config.models.language_model_config import LanguageModelConfig\n",
+    "from graphrag.config.models.vector_store_schema_config import VectorStoreSchemaConfig\n",
    "from graphrag.language_model.manager import ModelManager\n",
    "from graphrag.query.indexer_adapters import (\n",
    "    read_indexer_entities,\n",
@ -37,6 +37,7 @@
    "    DRIFTSearchContextBuilder,\n",
    ")\n",
    "from graphrag.query.structured_search.drift_search.search import DRIFTSearch\n",
+    "from graphrag.tokenizer.get_tokenizer import get_tokenizer\n",
    "from graphrag.vector_stores.lancedb import LanceDBVectorStore\n",
    "\n",
    "INPUT_DIR = \"./inputs/operation dulce\"\n",
@ -62,12 +63,16 @@
    "# load description embeddings to an in-memory lancedb vectorstore\n",
    "# to connect to a remote db, specify url and port values.\n",
    "description_embedding_store = LanceDBVectorStore(\n",
-    "    collection_name=\"default-entity-description\",\n",
+    "    vector_store_schema_config=VectorStoreSchemaConfig(\n",
+    "        index_name=\"default-entity-description\"\n",
+    "    ),\n",
    ")\n",
    "description_embedding_store.connect(db_uri=LANCEDB_URI)\n",
    "\n",
    "full_content_embedding_store = LanceDBVectorStore(\n",
-    "    collection_name=\"default-community-full_content\",\n",
+    "    vector_store_schema_config=VectorStoreSchemaConfig(\n",
+    "        index_name=\"default-community-full_content\"\n",
+    "    )\n",
    ")\n",
    "full_content_embedding_store.connect(db_uri=LANCEDB_URI)\n",
    "\n",
@ -94,33 +99,33 @@
   "outputs": [],
   "source": [
    "api_key = os.environ[\"GRAPHRAG_API_KEY\"]\n",
-    "llm_model = os.environ[\"GRAPHRAG_LLM_MODEL\"]\n",
-    "embedding_model = os.environ[\"GRAPHRAG_EMBEDDING_MODEL\"]\n",
    "\n",
    "chat_config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIChat,\n",
-    "    model=llm_model,\n",
+    "    type=ModelType.Chat,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"gpt-4.1\",\n",
    "    max_retries=20,\n",
    ")\n",
    "chat_model = ModelManager().get_or_create_chat_model(\n",
    "    name=\"local_search\",\n",
-    "    model_type=ModelType.OpenAIChat,\n",
+    "    model_type=ModelType.Chat,\n",
    "    config=chat_config,\n",
    ")\n",
    "\n",
-    "token_encoder = tiktoken.encoding_for_model(llm_model)\n",
+    "tokenizer = get_tokenizer(chat_config)\n",
    "\n",
    "embedding_config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIEmbedding,\n",
-    "    model=embedding_model,\n",
+    "    type=ModelType.Embedding,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"text-embedding-3-small\",\n",
    "    max_retries=20,\n",
    ")\n",
    "\n",
    "text_embedder = ModelManager().get_or_create_embedding_model(\n",
    "    name=\"local_search_embedding\",\n",
-    "    model_type=ModelType.OpenAIEmbedding,\n",
+    "    model_type=ModelType.Embedding,\n",
    "    config=embedding_config,\n",
    ")"
   ]
@ -173,12 +178,12 @@
    "    reports=reports,\n",
    "    entity_text_embeddings=description_embedding_store,\n",
    "    text_units=text_units,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    config=drift_params,\n",
    ")\n",
    "\n",
    "search = DRIFTSearch(\n",
-    "    model=chat_model, context_builder=context_builder, token_encoder=token_encoder\n",
+    "    model=chat_model, context_builder=context_builder, tokenizer=tokenizer\n",
    ")"
   ]
  },
@ -212,7 +217,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "graphrag",
   "language": "python",
   "name": "python3"
  },
@ -226,7 +231,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/docs/examples_notebooks/global_search.ipynb
+++ b/docs/examples_notebooks/global_search.ipynb
@ -19,7 +19,6 @@
    "import os\n",
    "\n",
    "import pandas as pd\n",
-    "import tiktoken\n",
    "\n",
    "from graphrag.config.enums import ModelType\n",
    "from graphrag.config.models.language_model_config import LanguageModelConfig\n",
@ -32,7 +31,8 @@
    "from graphrag.query.structured_search.global_search.community_context import (\n",
    "    GlobalCommunityContext,\n",
    ")\n",
-    "from graphrag.query.structured_search.global_search.search import GlobalSearch"
+    "from graphrag.query.structured_search.global_search.search import GlobalSearch\n",
+    "from graphrag.tokenizer.get_tokenizer import get_tokenizer"
   ]
  },
  {
@ -58,21 +58,21 @@
   "outputs": [],
   "source": [
    "api_key = os.environ[\"GRAPHRAG_API_KEY\"]\n",
-    "llm_model = os.environ[\"GRAPHRAG_LLM_MODEL\"]\n",
    "\n",
    "config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIChat,\n",
-    "    model=llm_model,\n",
+    "    type=ModelType.Chat,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"gpt-4.1\",\n",
    "    max_retries=20,\n",
    ")\n",
    "model = ModelManager().get_or_create_chat_model(\n",
    "    name=\"global_search\",\n",
-    "    model_type=ModelType.OpenAIChat,\n",
+    "    model_type=ModelType.Chat,\n",
    "    config=config,\n",
    ")\n",
    "\n",
-    "token_encoder = tiktoken.encoding_for_model(llm_model)"
+    "tokenizer = get_tokenizer(config)"
   ]
  },
  {
@ -142,7 +142,7 @@
    "    community_reports=reports,\n",
    "    communities=communities,\n",
    "    entities=entities,  # default to None if you don't want to use community weights for ranking\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    ")"
   ]
  },
@ -193,7 +193,7 @@
    "search_engine = GlobalSearch(\n",
    "    model=model,\n",
    "    context_builder=context_builder,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    max_data_tokens=12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)\n",
    "    map_llm_params=map_llm_params,\n",
    "    reduce_llm_params=reduce_llm_params,\n",
@ -241,7 +241,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "graphrag",
   "language": "python",
   "name": "python3"
  },
@ -255,7 +255,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
+++ b/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
@ -19,7 +19,6 @@
    "import os\n",
    "\n",
    "import pandas as pd\n",
-    "import tiktoken\n",
    "\n",
    "from graphrag.config.enums import ModelType\n",
    "from graphrag.config.models.language_model_config import LanguageModelConfig\n",
@ -57,22 +56,24 @@
   "metadata": {},
   "outputs": [],
   "source": [
+    "from graphrag.tokenizer.get_tokenizer import get_tokenizer\n",
+    "\n",
    "api_key = os.environ[\"GRAPHRAG_API_KEY\"]\n",
-    "llm_model = os.environ[\"GRAPHRAG_LLM_MODEL\"]\n",
    "\n",
    "config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIChat,\n",
-    "    model=llm_model,\n",
+    "    type=ModelType.Chat,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"gpt-4.1\",\n",
    "    max_retries=20,\n",
    ")\n",
    "model = ModelManager().get_or_create_chat_model(\n",
    "    name=\"global_search\",\n",
-    "    model_type=ModelType.OpenAIChat,\n",
+    "    model_type=ModelType.Chat,\n",
    "    config=config,\n",
    ")\n",
    "\n",
-    "token_encoder = tiktoken.encoding_for_model(llm_model)"
+    "tokenizer = get_tokenizer(config)"
   ]
  },
  {
@ -155,11 +156,11 @@
    "    community_reports=reports,\n",
    "    communities=communities,\n",
    "    entities=entities,  # default to None if you don't want to use community weights for ranking\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    dynamic_community_selection=True,\n",
    "    dynamic_community_selection_kwargs={\n",
    "        \"model\": model,\n",
-    "        \"token_encoder\": token_encoder,\n",
+    "        \"tokenizer\": tokenizer,\n",
    "    },\n",
    ")"
   ]
@ -211,7 +212,7 @@
    "search_engine = GlobalSearch(\n",
    "    model=model,\n",
    "    context_builder=context_builder,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    max_data_tokens=12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)\n",
    "    map_llm_params=map_llm_params,\n",
    "    reduce_llm_params=reduce_llm_params,\n",
@ -255,7 +256,7 @@
    "prompt_tokens = result.prompt_tokens_categories[\"build_context\"]\n",
    "output_tokens = result.output_tokens_categories[\"build_context\"]\n",
    "print(\n",
-    "    f\"Build context ({llm_model})\\nLLM calls: {llm_calls}. Prompt tokens: {prompt_tokens}. Output tokens: {output_tokens}.\"\n",
+    "    f\"Build context LLM calls: {llm_calls}. Prompt tokens: {prompt_tokens}. Output tokens: {output_tokens}.\"\n",
    ")\n",
    "# inspect number of LLM calls and tokens in map-reduce\n",
    "llm_calls = result.llm_calls_categories[\"map\"] + result.llm_calls_categories[\"reduce\"]\n",
@ -266,14 +267,14 @@
    "    result.output_tokens_categories[\"map\"] + result.output_tokens_categories[\"reduce\"]\n",
    ")\n",
    "print(\n",
-    "    f\"Map-reduce ({llm_model})\\nLLM calls: {llm_calls}. Prompt tokens: {prompt_tokens}. Output tokens: {output_tokens}.\"\n",
+    "    f\"Map-reduce LLM calls: {llm_calls}. Prompt tokens: {prompt_tokens}. Output tokens: {output_tokens}.\"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "graphrag",
   "language": "python",
   "name": "python3"
  },
@ -287,7 +288,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/docs/examples_notebooks/local_search.ipynb
+++ b/docs/examples_notebooks/local_search.ipynb
@ -19,8 +19,8 @@
    "import os\n",
    "\n",
    "import pandas as pd\n",
-    "import tiktoken\n",
    "\n",
+    "from graphrag.config.models.vector_store_schema_config import VectorStoreSchemaConfig\n",
    "from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey\n",
    "from graphrag.query.indexer_adapters import (\n",
    "    read_indexer_covariates,\n",
@ -102,7 +102,9 @@
    "# load description embeddings to an in-memory lancedb vectorstore\n",
    "# to connect to a remote db, specify url and port values.\n",
    "description_embedding_store = LanceDBVectorStore(\n",
-    "    collection_name=\"default-entity-description\",\n",
+    "    vector_store_schema_config=VectorStoreSchemaConfig(\n",
+    "        index_name=\"default-entity-description\"\n",
+    "    )\n",
    ")\n",
    "description_embedding_store.connect(db_uri=LANCEDB_URI)\n",
    "\n",
@ -195,37 +197,38 @@
    "from graphrag.config.enums import ModelType\n",
    "from graphrag.config.models.language_model_config import LanguageModelConfig\n",
    "from graphrag.language_model.manager import ModelManager\n",
+    "from graphrag.tokenizer.get_tokenizer import get_tokenizer\n",
    "\n",
    "api_key = os.environ[\"GRAPHRAG_API_KEY\"]\n",
-    "llm_model = os.environ[\"GRAPHRAG_LLM_MODEL\"]\n",
-    "embedding_model = os.environ[\"GRAPHRAG_EMBEDDING_MODEL\"]\n",
    "\n",
    "chat_config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIChat,\n",
-    "    model=llm_model,\n",
+    "    type=ModelType.Chat,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"gpt-4.1\",\n",
    "    max_retries=20,\n",
    ")\n",
    "chat_model = ModelManager().get_or_create_chat_model(\n",
    "    name=\"local_search\",\n",
-    "    model_type=ModelType.OpenAIChat,\n",
+    "    model_type=ModelType.Chat,\n",
    "    config=chat_config,\n",
    ")\n",
    "\n",
-    "token_encoder = tiktoken.encoding_for_model(llm_model)\n",
-    "\n",
    "embedding_config = LanguageModelConfig(\n",
    "    api_key=api_key,\n",
-    "    type=ModelType.OpenAIEmbedding,\n",
-    "    model=embedding_model,\n",
+    "    type=ModelType.Embedding,\n",
+    "    model_provider=\"openai\",\n",
+    "    model=\"text-embedding-3-small\",\n",
    "    max_retries=20,\n",
    ")\n",
    "\n",
    "text_embedder = ModelManager().get_or_create_embedding_model(\n",
    "    name=\"local_search_embedding\",\n",
-    "    model_type=ModelType.OpenAIEmbedding,\n",
+    "    model_type=ModelType.Embedding,\n",
    "    config=embedding_config,\n",
-    ")"
+    ")\n",
+    "\n",
+    "tokenizer = get_tokenizer(chat_config)"
   ]
  },
  {
@ -251,7 +254,7 @@
    "    entity_text_embeddings=description_embedding_store,\n",
    "    embedding_vectorstore_key=EntityVectorStoreKey.ID,  # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE\n",
    "    text_embedder=text_embedder,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    ")"
   ]
  },
@ -314,7 +317,7 @@
    "search_engine = LocalSearch(\n",
    "    model=chat_model,\n",
    "    context_builder=context_builder,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    model_params=model_params,\n",
    "    context_builder_params=local_context_params,\n",
    "    response_type=\"multiple paragraphs\",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report\n",
@ -426,7 +429,7 @@
    "question_generator = LocalQuestionGen(\n",
    "    model=chat_model,\n",
    "    context_builder=context_builder,\n",
-    "    token_encoder=token_encoder,\n",
+    "    tokenizer=tokenizer,\n",
    "    model_params=model_params,\n",
    "    context_builder_params=local_context_params,\n",
    ")"
@ -451,7 +454,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "graphrag",
   "language": "python",
   "name": "python3"
  },
@ -465,7 +468,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/docs/get_started.md
+++ b/docs/get_started.md
@ -1,5 +1,7 @@
 # Getting Started

+⚠️ GraphRAG can consume a lot of LLM resources! We strongly recommend starting with the tutorial dataset here until you understand how the system works, and consider experimenting with fast/inexpensive models first before committing to a big indexing job.
+
 ## Requirements

 [Python 3.10-3.12](https://www.python.org/downloads/)
@ -24,25 +26,25 @@ pip install graphrag
 We need to set up a data project and some initial configuration. First let's get a sample dataset ready:

 ```sh
-mkdir -p ./ragtest/input
+mkdir -p ./christmas/input
 ```

 Get a copy of A Christmas Carol by Charles Dickens from a trusted source:

 ```sh
-curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragtest/input/book.txt
+curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./christmas/input/book.txt
 ```

 ## Set Up Your Workspace Variables

 To initialize your workspace, first run the `graphrag init` command.
-Since we have already configured a directory named `./ragtest` in the previous step, run the following command:
+Since we have already configured a directory named `./christmas` in the previous step, run the following command:

 ```sh
-graphrag init --root ./ragtest
+graphrag init --root ./christmas
 ```

-This will create two files: `.env` and `settings.yaml` in the `./ragtest` directory.
+This will create two files: `.env` and `settings.yaml` in the `./christmas` directory.

 - `.env` contains the environment variables required to run the GraphRAG pipeline. If you inspect the file, you'll see a single environment variable defined,
  `GRAPHRAG_API_KEY=<API_KEY>`. Replace `<API_KEY>` with your own OpenAI or Azure API key.
@ -78,13 +80,13 @@ You will also need to login with [az login](https://learn.microsoft.com/en-us/cl
 Finally we'll run the pipeline!

 ```sh
-graphrag index --root ./ragtest
+graphrag index --root ./christmas
 ```

 ![pipeline executing from the CLI](img/pipeline-running.png)

 This process will take some time to run. This depends on the size of your input data, what model you're using, and the text chunk size being used (these can be configured in your `settings.yaml` file).
-Once the pipeline is complete, you should see a new folder called `./ragtest/output` with a series of parquet files.
+Once the pipeline is complete, you should see a new folder called `./christmas/output` with a series of parquet files.

 # Using the Query Engine

@ -94,7 +96,7 @@ Here is an example using Global search to ask a high-level question:

 ```sh
 graphrag query \
--root ./ragtest \
+--root ./christmas \
 --method global \
 --query "What are the top themes in this story?"
 ```
@ -103,7 +105,7 @@ Here is an example using Local search to ask a more specific question about a pa

 ```sh
 graphrag query \
--root ./ragtest \
+--root ./christmas \
 --method local \
 --query "Who is Scrooge and what are his main relationships?"
 ```
--- a/docs/index.md
+++ b/docs/index.md
@ -47,6 +47,7 @@ At query time, these structures are used to provide materials for the LLM contex
 - [_Global Search_](query/global_search.md) for reasoning about holistic questions about the corpus by leveraging the community summaries.
 - [_Local Search_](query/local_search.md) for reasoning about specific entities by fanning-out to their neighbors and associated concepts.
 - [_DRIFT Search_](query/drift_search.md) for reasoning about specific entities by fanning-out to their neighbors and associated concepts, but with the added context of community information.
+- _Basic Search_ for those times when your query is best answered by baseline RAG (standard top _k_ vector search).

 ### Prompt Tuning

--- a/docs/index/architecture.md
+++ b/docs/index/architecture.md
@ -32,3 +32,20 @@ The GraphRAG library was designed with LLM interactions in mind, and a common se
 Because of these potential error cases, we've added a cache layer around LLM interactions.
 When completion requests are made using the same input set (prompt and tuning parameters), we return a cached result if one exists.
 This allows our indexer to be more resilient to network issues, to act idempotently, and to provide a more efficient end-user experience.
+
+### Providers & Factories
+
+Several subsystems within GraphRAG use a factory pattern to register and retrieve provider implementations. This allows deep customization to support models, storage, and so on that you may use but isn't built directly into GraphRAG.
+
+The following subsystems use a factory pattern that allows you to register your own implementations:
+
+- [language model](https://github.com/microsoft/graphrag/blob/main/graphrag/language_model/factory.py) - implement your own `chat` and `embed` methods to use a model provider of choice beyond the built-in OpenAI/Azure support
+- [cache](https://github.com/microsoft/graphrag/blob/main/graphrag/cache/factory.py) - create your own cache storage location in addition to the file, blob, and CosmosDB ones we provide
+- [logger](https://github.com/microsoft/graphrag/blob/main/graphrag/logger/factory.py) - create your own log writing location in addition to the built-in file and blob storage
+- [storage](https://github.com/microsoft/graphrag/blob/main/graphrag/storage/factory.py) - create your own storage provider (database, etc.) beyond the file, blob, and CosmosDB ones built in
+- [vector store](https://github.com/microsoft/graphrag/blob/main/graphrag/vector_stores/factory.py) - implement your own vector store other than the built-in lancedb, Azure AI Search, and CosmosDB ones built in
+- [pipeline + workflows](https://github.com/microsoft/graphrag/blob/main/graphrag/index/workflows/factory.py) - implement your own workflow steps with a custom `run_workflow` function, or register an entire pipeline (list of named workflows)
+
+The links for each of these subsystems point to the source code of the factory, which includes registration of the default built-in implementations. In addition, we have a detailed discussion of [language models](../config/models.md), which includes and example of a custom provider, and a [sample notebook](../examples_notebooks/custom_vector_store.ipynb) that demonstrates a custom vector store.
+
+All of these factories allow you to register an impl using any string name you would like, even overriding built-in ones directly.
--- a/docs/index/inputs.md
+++ b/docs/index/inputs.md
@ -16,6 +16,10 @@ All input formats are loaded within GraphRAG and passed to the indexing pipeline

 Also see the [outputs](outputs.md) documentation for the final documents table schema saved to parquet after pipeline completion.

+## Bring-your-own DataFrame
+
+As of version 2.6.0, GraphRAG's [indexing API method](https://github.com/microsoft/graphrag/blob/main/graphrag/api/index.py) allows you to pass in your own pandas DataFrame and bypass all of the input loading/parsing described in the next section. This is convenient if you have content in a format or storage location we don't support out-of-the-box. __You must ensure that your input DataFrame conforms to the schema described above.__ All of the chunking behavior described later will proceed exactly the same.
+
 ## Formats

 We support three file formats out-of-the-box. This covers the overwhelming majority of use cases we have encountered. If you have a different format, we recommend writing a script to convert to one of these, which are widely used and supported by many tools and libraries.
--- a/docs/prompt_tuning/auto_prompt_tuning.md
+++ b/docs/prompt_tuning/auto_prompt_tuning.md
@ -79,15 +79,7 @@ After that, it uses one of the following selection methods to pick a sample to w

 ## Modify Env Vars

-After running auto tuning, you should modify the following environment variables (or config variables) to pick up the new prompts on your index run. Note: Please make sure to update the correct path to the generated prompts, in this example we are using the default "prompts" path.
-
- `GRAPHRAG_ENTITY_EXTRACTION_PROMPT_FILE` = "prompts/entity_extraction.txt"
-
- `GRAPHRAG_COMMUNITY_REPORT_PROMPT_FILE` = "prompts/community_report.txt"
-
- `GRAPHRAG_SUMMARIZE_DESCRIPTIONS_PROMPT_FILE` = "prompts/summarize_descriptions.txt"
-
-or in your yaml config file:
+After running auto tuning, you should modify the following config variables to pick up the new prompts on your index run. Note: Please make sure to update the correct path to the generated prompts, in this example we are using the default "prompts" path.

 ```yaml
 entity_extraction:
--- a/docs/query/drift_search.md
+++ b/docs/query/drift_search.md
@ -24,7 +24,7 @@ Below are the key parameters of the [DRIFTSearch class](https://github.com/micro
 - `llm`: OpenAI model object to be used for response generation
 - `context_builder`: [context builder](https://github.com/microsoft/graphrag/blob/main/graphrag/query/structured_search/drift_search/drift_context.py) object to be used for preparing context data from community reports and query information
 - `config`: model to define the DRIFT Search hyperparameters. [DRIFT Config model](https://github.com/microsoft/graphrag/blob/main/graphrag/config/models/drift_search_config.py)
- `token_encoder`: token encoder for tracking the budget for the algorithm.
+- `tokenizer`: token encoder for tracking the budget for the algorithm.
 - `query_state`: a state object as defined in [Query State](https://github.com/microsoft/graphrag/blob/main/graphrag/query/structured_search/drift_search/state.py) that allows to track execution of a DRIFT Search instance, alongside follow ups and [DRIFT actions](https://github.com/microsoft/graphrag/blob/main/graphrag/query/structured_search/drift_search/action.py).

 ## How to Use
--- a/graphrag/api/query.py
+++ b/graphrag/api/query.py
@ -231,6 +231,10 @@ async def multi_index_global_search(
    """
    init_loggers(config=config, verbose=verbose, filename="query.log")

+    logger.warning(
+        "Multi-index search is deprecated and will be removed in GraphRAG v3."
+    )
+
    # Streaming not supported yet
    if streaming:
        message = "Streaming not yet implemented for multi_global_search"
@ -510,6 +514,9 @@ async def multi_index_local_search(
    """
    init_loggers(config=config, verbose=verbose, filename="query.log")

+    logger.warning(
+        "Multi-index search is deprecated and will be removed in GraphRAG v3."
+    )
    # Streaming not supported yet
    if streaming:
        message = "Streaming not yet implemented for multi_index_local_search"
@ -874,6 +881,10 @@ async def multi_index_drift_search(
    """
    init_loggers(config=config, verbose=verbose, filename="query.log")

+    logger.warning(
+        "Multi-index search is deprecated and will be removed in GraphRAG v3."
+    )
+
    # Streaming not supported yet
    if streaming:
        message = "Streaming not yet implemented for multi_drift_search"
@ -1166,6 +1177,10 @@ async def multi_index_basic_search(
    """
    init_loggers(config=config, verbose=verbose, filename="query.log")

+    logger.warning(
+        "Multi-index search is deprecated and will be removed in GraphRAG v3."
+    )
+
    # Streaming not supported yet
    if streaming:
        message = "Streaming not yet implemented for multi_basic_search"
--- a/graphrag/config/defaults.py
+++ b/graphrag/config/defaults.py
@ -6,7 +6,7 @@
 from collections.abc import Callable
 from dataclasses import dataclass, field
 from pathlib import Path
-from typing import ClassVar, Literal
+from typing import ClassVar

 from graphrag.config.embeddings import default_embeddings
 from graphrag.config.enums import (
@ -46,13 +46,14 @@ from graphrag.language_model.providers.litellm.services.retry.retry import Retry

 DEFAULT_OUTPUT_BASE_DIR = "output"
 DEFAULT_CHAT_MODEL_ID = "default_chat_model"
-DEFAULT_CHAT_MODEL_TYPE = ModelType.OpenAIChat
+DEFAULT_CHAT_MODEL_TYPE = ModelType.Chat
 DEFAULT_CHAT_MODEL = "gpt-4-turbo-preview"
 DEFAULT_CHAT_MODEL_AUTH_TYPE = AuthType.APIKey
 DEFAULT_EMBEDDING_MODEL_ID = "default_embedding_model"
-DEFAULT_EMBEDDING_MODEL_TYPE = ModelType.OpenAIEmbedding
+DEFAULT_EMBEDDING_MODEL_TYPE = ModelType.Embedding
 DEFAULT_EMBEDDING_MODEL = "text-embedding-3-small"
 DEFAULT_EMBEDDING_MODEL_AUTH_TYPE = AuthType.APIKey
+DEFAULT_MODEL_PROVIDER = "openai"
 DEFAULT_VECTOR_STORE_ID = "default_vector_store"

 ENCODING_MODEL = "cl100k_base"
@ -325,10 +326,10 @@ class LanguageModelDefaults:
    proxy: None = None
    audience: None = None
    model_supports_json: None = None
-    tokens_per_minute: Literal["auto"] = "auto"
-    requests_per_minute: Literal["auto"] = "auto"
+    tokens_per_minute: None = None
+    requests_per_minute: None = None
    rate_limit_strategy: str | None = "static"
-    retry_strategy: str = "native"
+    retry_strategy: str = "exponential_backoff"
    max_retries: int = 10
    max_retry_wait: float = 10.0
    concurrent_requests: int = 25
--- a/graphrag/config/errors.py
+++ b/graphrag/config/errors.py
@ -33,15 +33,6 @@ class AzureApiVersionMissingError(ValueError):
        super().__init__(msg)


-class AzureDeploymentNameMissingError(ValueError):
-    """Azure Deployment Name missing error."""
-
-    def __init__(self, llm_type: str) -> None:
-        """Init method definition."""
-        msg = f"Deployment name is required for {llm_type}. Please rerun `graphrag init` set the deployment_name."
-        super().__init__(msg)
-
-
 class LanguageModelConfigMissingError(ValueError):
    """Missing model configuration error."""

--- a/graphrag/config/get_embedding_settings.py
+++ b/graphrag/config/get_embedding_settings.py
@ -11,7 +11,6 @@ def get_embedding_settings(
    vector_store_params: dict | None = None,
 ) -> dict:
    """Transform GraphRAG config into settings for workflows."""
-    # TEMP
    embeddings_llm_settings = settings.get_language_model_config(
        settings.embed_text.model_id
    )
--- a/graphrag/config/init_content.py
+++ b/graphrag/config/init_content.py
@ -19,41 +19,34 @@ INIT_YAML = f"""\

 models:
  {defs.DEFAULT_CHAT_MODEL_ID}:
-    type: {defs.DEFAULT_CHAT_MODEL_TYPE.value} # or azure_openai_chat
-    # api_base: https://<instance>.openai.azure.com
-    # api_version: 2024-05-01-preview
+    type: {defs.DEFAULT_CHAT_MODEL_TYPE.value}
+    model_provider: {defs.DEFAULT_MODEL_PROVIDER}
    auth_type: {defs.DEFAULT_CHAT_MODEL_AUTH_TYPE.value} # or azure_managed_identity
-    api_key: ${{GRAPHRAG_API_KEY}} # set this in the generated .env file
-    # audience: "https://cognitiveservices.azure.com/.default"
-    # organization: <organization_id>
+    api_key: ${{GRAPHRAG_API_KEY}} # set this in the generated .env file, or remove if managed identity
    model: {defs.DEFAULT_CHAT_MODEL}
-    # deployment_name: <azure_model_deployment_name>
-    # encoding_model: {defs.ENCODING_MODEL} # automatically set by tiktoken if left undefined
-    model_supports_json: true # recommended if this is available for your model.
-    concurrent_requests: {language_model_defaults.concurrent_requests} # max number of simultaneous LLM requests allowed
-    async_mode: {language_model_defaults.async_mode.value} # or asyncio
-    retry_strategy: native
-    max_retries: {language_model_defaults.max_retries}
-    tokens_per_minute: {language_model_defaults.tokens_per_minute}              # set to null to disable rate limiting
-    requests_per_minute: {language_model_defaults.requests_per_minute}            # set to null to disable rate limiting
-  {defs.DEFAULT_EMBEDDING_MODEL_ID}:
-    type: {defs.DEFAULT_EMBEDDING_MODEL_TYPE.value} # or azure_openai_embedding
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-05-01-preview
-    auth_type: {defs.DEFAULT_EMBEDDING_MODEL_AUTH_TYPE.value} # or azure_managed_identity
-    api_key: ${{GRAPHRAG_API_KEY}}
-    # audience: "https://cognitiveservices.azure.com/.default"
-    # organization: <organization_id>
-    model: {defs.DEFAULT_EMBEDDING_MODEL}
-    # deployment_name: <azure_model_deployment_name>
-    # encoding_model: {defs.ENCODING_MODEL} # automatically set by tiktoken if left undefined
    model_supports_json: true # recommended if this is available for your model.
-    concurrent_requests: {language_model_defaults.concurrent_requests} # max number of simultaneous LLM requests allowed
+    concurrent_requests: {language_model_defaults.concurrent_requests}
    async_mode: {language_model_defaults.async_mode.value} # or asyncio
-    retry_strategy: native
+    retry_strategy: {language_model_defaults.retry_strategy}
    max_retries: {language_model_defaults.max_retries}
-    tokens_per_minute: null              # set to null to disable rate limiting or auto for dynamic
-    requests_per_minute: null            # set to null to disable rate limiting or auto for dynamic
+    tokens_per_minute: null
+    requests_per_minute: null
+  {defs.DEFAULT_EMBEDDING_MODEL_ID}:
+    type: {defs.DEFAULT_EMBEDDING_MODEL_TYPE.value}
+    model_provider: {defs.DEFAULT_MODEL_PROVIDER}
+    auth_type: {defs.DEFAULT_EMBEDDING_MODEL_AUTH_TYPE.value}
+    api_key: ${{GRAPHRAG_API_KEY}}
+    model: {defs.DEFAULT_EMBEDDING_MODEL}
+    # api_base: https://<instance>.openai.azure.com
+    # api_version: 2024-05-01-preview
+    concurrent_requests: {language_model_defaults.concurrent_requests}
+    async_mode: {language_model_defaults.async_mode.value} # or asyncio
+    retry_strategy: {language_model_defaults.retry_strategy}
+    max_retries: {language_model_defaults.max_retries}
+    tokens_per_minute: null
+    requests_per_minute: null

 ### Input settings ###

@ -62,7 +55,6 @@ input:
    type: {graphrag_config_defaults.input.storage.type.value} # or blob
    base_dir: "{graphrag_config_defaults.input.storage.base_dir}"
  file_type: {graphrag_config_defaults.input.file_type.value} # [csv, text, json]
-  

 chunks:
  size: {graphrag_config_defaults.chunks.size}
@ -90,7 +82,6 @@ vector_store:
    type: {vector_store_defaults.type}
    db_uri: {vector_store_defaults.db_uri}
    container_name: {vector_store_defaults.container_name}
-    overwrite: {vector_store_defaults.overwrite}

 ### Workflow settings ###

--- a/graphrag/config/models/graph_rag_config.py
+++ b/graphrag/config/models/graph_rag_config.py
@ -107,7 +107,7 @@ class GraphRagConfig(BaseModel):

                _ = retry_factory.create(
                    strategy=model.retry_strategy,
-                    max_attempts=model.max_retries,
+                    max_retries=model.max_retries,
                    max_retry_wait=model.max_retry_wait,
                )

--- a/graphrag/config/models/language_model_config.py
+++ b/graphrag/config/models/language_model_config.py
@ -3,6 +3,7 @@

 """Language model configuration."""

+import logging
 from typing import Literal

 import tiktoken
@ -14,11 +15,12 @@ from graphrag.config.errors import (
    ApiKeyMissingError,
    AzureApiBaseMissingError,
    AzureApiVersionMissingError,
-    AzureDeploymentNameMissingError,
    ConflictingSettingsError,
 )
 from graphrag.language_model.factory import ModelFactory

+logger = logging.getLogger(__name__)
+

 class LanguageModelConfig(BaseModel):
    """Language model configuration."""
@ -96,6 +98,14 @@ class LanguageModelConfig(BaseModel):
        if not ModelFactory.is_supported_model(self.type):
            msg = f"Model type {self.type} is not recognized, must be one of {ModelFactory.get_chat_models() + ModelFactory.get_embedding_models()}."
            raise KeyError(msg)
+        if self.type in [
+            "openai_chat",
+            "openai_embedding",
+            "azure_openai_chat",
+            "azure_openai_embedding",
+        ]:
+            msg = f"Model config based on fnllm is deprecated and will be removed in GraphRAG v3, please use {ModelType.Chat} or {ModelType.Embedding} instead to switch to LiteLLM config."
+            logger.warning(msg)

    model_provider: str | None = Field(
        description="The model provider to use.",
@ -214,7 +224,8 @@ class LanguageModelConfig(BaseModel):
            or self.type == ModelType.AzureOpenAIEmbedding
            or self.model_provider == "azure"  # indicates Litellm + AOI
        ) and (self.deployment_name is None or self.deployment_name.strip() == ""):
-            raise AzureDeploymentNameMissingError(self.type)
+            msg = f"deployment_name is not set for Azure-hosted model. This will default to your model name ({self.model}). If different, this should be set."
+            logger.debug(msg)

    organization: str | None = Field(
        description="The organization to use for the LLM service.",
--- a/graphrag/index/operations/embed_text/embed_text.py
+++ b/graphrag/index/operations/embed_text/embed_text.py
@ -210,7 +210,7 @@ def _create_vector_store(
    vector_store = VectorStoreFactory().create_vector_store(
        vector_store_schema_config=single_embedding_config,
        vector_store_type=vector_store_type,
-        kwargs=vector_store_config,
+        **vector_store_config,
    )

    vector_store.connect(**vector_store_config)
--- a/graphrag/index/operations/summarize_communities/build_mixed_context.py
+++ b/graphrag/index/operations/summarize_communities/build_mixed_context.py
@ -8,10 +8,12 @@ import graphrag.data_model.schemas as schemas
 from graphrag.index.operations.summarize_communities.graph_context.sort_context import (
    sort_context,
 )
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.tokenizer import Tokenizer


-def build_mixed_context(context: list[dict], max_context_tokens: int) -> str:
+def build_mixed_context(
+    context: list[dict], tokenizer: Tokenizer, max_context_tokens: int
+) -> str:
    """
    Build parent context by concatenating all sub-communities' contexts.

@ -45,9 +47,10 @@ def build_mixed_context(context: list[dict], max_context_tokens: int) -> str:
                remaining_local_context.extend(sorted_context[rid][schemas.ALL_CONTEXT])
            new_context_string = sort_context(
                local_context=remaining_local_context + final_local_contexts,
+                tokenizer=tokenizer,
                sub_community_reports=substitute_reports,
            )
-            if num_tokens(new_context_string) <= max_context_tokens:
+            if tokenizer.num_tokens(new_context_string) <= max_context_tokens:
                exceeded_limit = False
                context_string = new_context_string
                break
@ -63,7 +66,7 @@ def build_mixed_context(context: list[dict], max_context_tokens: int) -> str:
            new_context_string = pd.DataFrame(substitute_reports).to_csv(
                index=False, sep=","
            )
-            if num_tokens(new_context_string) > max_context_tokens:
+            if tokenizer.num_tokens(new_context_string) > max_context_tokens:
                break

            context_string = new_context_string
--- a/graphrag/index/operations/summarize_communities/graph_context/context_builder.py
+++ b/graphrag/index/operations/summarize_communities/graph_context/context_builder.py
@ -30,7 +30,7 @@ from graphrag.index.utils.dataframes import (
    where_column_equals,
 )
 from graphrag.logger.progress import progress_iterable
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)

@ -39,6 +39,7 @@ def build_local_context(
    nodes,
    edges,
    claims,
+    tokenizer: Tokenizer,
    callbacks: WorkflowCallbacks,
    max_context_tokens: int = 16_000,
 ):
@ -49,7 +50,7 @@ def build_local_context(

    for level in progress_iterable(levels, callbacks.progress, len(levels)):
        communities_at_level_df = _prepare_reports_at_level(
-            nodes, edges, claims, level, max_context_tokens
+            nodes, edges, claims, tokenizer, level, max_context_tokens
        )

        communities_at_level_df.loc[:, schemas.COMMUNITY_LEVEL] = level
@ -63,6 +64,7 @@ def _prepare_reports_at_level(
    node_df: pd.DataFrame,
    edge_df: pd.DataFrame,
    claim_df: pd.DataFrame | None,
+    tokenizer: Tokenizer,
    level: int,
    max_context_tokens: int = 16_000,
 ) -> pd.DataFrame:
@ -181,6 +183,7 @@ def _prepare_reports_at_level(
    # Generate community-level context strings using vectorized batch processing
    return parallel_sort_context_batch(
        community_df,
+        tokenizer=tokenizer,
        max_context_tokens=max_context_tokens,
    )

@ -189,6 +192,7 @@ def build_level_context(
    report_df: pd.DataFrame | None,
    community_hierarchy_df: pd.DataFrame,
    local_context_df: pd.DataFrame,
+    tokenizer: Tokenizer,
    level: int,
    max_context_tokens: int,
 ) -> pd.DataFrame:
@ -219,11 +223,11 @@ def build_level_context(

    if report_df is None or report_df.empty:
        invalid_context_df.loc[:, schemas.CONTEXT_STRING] = _sort_and_trim_context(
-            invalid_context_df, max_context_tokens
+            invalid_context_df, tokenizer, max_context_tokens
        )
        invalid_context_df[schemas.CONTEXT_SIZE] = invalid_context_df.loc[
            :, schemas.CONTEXT_STRING
-        ].map(num_tokens)
+        ].map(tokenizer.num_tokens)
        invalid_context_df[schemas.CONTEXT_EXCEED_FLAG] = False
        return union(valid_context_df, invalid_context_df)

@ -237,6 +241,7 @@ def build_level_context(
        invalid_context_df,
        sub_context_df,
        community_hierarchy_df,
+        tokenizer,
        max_context_tokens,
    )

@ -244,11 +249,13 @@ def build_level_context(
    # this should be rare, but if it happens, we will just trim the local context to fit the limit
    remaining_df = _antijoin_reports(invalid_context_df, community_df)
    remaining_df.loc[:, schemas.CONTEXT_STRING] = _sort_and_trim_context(
-        remaining_df, max_context_tokens
+        remaining_df, tokenizer, max_context_tokens
    )

    result = union(valid_context_df, community_df, remaining_df)
-    result[schemas.CONTEXT_SIZE] = result.loc[:, schemas.CONTEXT_STRING].map(num_tokens)
+    result[schemas.CONTEXT_SIZE] = result.loc[:, schemas.CONTEXT_STRING].map(
+        tokenizer.num_tokens
+    )

    result[schemas.CONTEXT_EXCEED_FLAG] = False
    return result
@ -269,19 +276,29 @@ def _antijoin_reports(df: pd.DataFrame, reports: pd.DataFrame) -> pd.DataFrame:
    return antijoin(df, reports, schemas.COMMUNITY_ID)


-def _sort_and_trim_context(df: pd.DataFrame, max_context_tokens: int) -> pd.Series:
+def _sort_and_trim_context(
+    df: pd.DataFrame, tokenizer: Tokenizer, max_context_tokens: int
+) -> pd.Series:
    """Sort and trim context to fit the limit."""
    series = cast("pd.Series", df[schemas.ALL_CONTEXT])
    return transform_series(
-        series, lambda x: sort_context(x, max_context_tokens=max_context_tokens)
+        series,
+        lambda x: sort_context(
+            x, tokenizer=tokenizer, max_context_tokens=max_context_tokens
+        ),
    )


-def _build_mixed_context(df: pd.DataFrame, max_context_tokens: int) -> pd.Series:
+def _build_mixed_context(
+    df: pd.DataFrame, tokenizer: Tokenizer, max_context_tokens: int
+) -> pd.Series:
    """Sort and trim context to fit the limit."""
    series = cast("pd.Series", df[schemas.ALL_CONTEXT])
    return transform_series(
-        series, lambda x: build_mixed_context(x, max_context_tokens=max_context_tokens)
+        series,
+        lambda x: build_mixed_context(
+            x, tokenizer, max_context_tokens=max_context_tokens
+        ),
    )


@ -303,6 +320,7 @@ def _get_community_df(
    invalid_context_df: pd.DataFrame,
    sub_context_df: pd.DataFrame,
    community_hierarchy_df: pd.DataFrame,
+    tokenizer: Tokenizer,
    max_context_tokens: int,
 ) -> pd.DataFrame:
    """Get community context for each community."""
@ -338,7 +356,7 @@ def _get_community_df(
        .reset_index()
    )
    community_df[schemas.CONTEXT_STRING] = _build_mixed_context(
-        community_df, max_context_tokens
+        community_df, tokenizer, max_context_tokens
    )
    community_df[schemas.COMMUNITY_LEVEL] = level
    return community_df
--- a/graphrag/index/operations/summarize_communities/graph_context/sort_context.py
+++ b/graphrag/index/operations/summarize_communities/graph_context/sort_context.py
@ -5,11 +5,12 @@
 import pandas as pd

 import graphrag.data_model.schemas as schemas
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.tokenizer import Tokenizer


 def sort_context(
    local_context: list[dict],
+    tokenizer: Tokenizer,
    sub_community_reports: list[dict] | None = None,
    max_context_tokens: int | None = None,
    node_name_column: str = schemas.TITLE,
@ -112,7 +113,10 @@ def sort_context(
        new_context_string = _get_context_string(
            sorted_nodes, sorted_edges, sorted_claims, sub_community_reports
        )
-        if max_context_tokens and num_tokens(new_context_string) > max_context_tokens:
+        if (
+            max_context_tokens
+            and tokenizer.num_tokens(new_context_string) > max_context_tokens
+        ):
            break
        context_string = new_context_string

@ -122,7 +126,9 @@ def sort_context(
    )


-def parallel_sort_context_batch(community_df, max_context_tokens, parallel=False):
+def parallel_sort_context_batch(
+    community_df, tokenizer: Tokenizer, max_context_tokens, parallel=False
+):
    """Calculate context using parallelization if enabled."""
    if parallel:
        # Use ThreadPoolExecutor for parallel execution
@ -131,7 +137,9 @@ def parallel_sort_context_batch(community_df, max_context_tokens, parallel=False
        with ThreadPoolExecutor(max_workers=None) as executor:
            context_strings = list(
                executor.map(
-                    lambda x: sort_context(x, max_context_tokens=max_context_tokens),
+                    lambda x: sort_context(
+                        x, tokenizer, max_context_tokens=max_context_tokens
+                    ),
                    community_df[schemas.ALL_CONTEXT],
                )
            )
@ -141,13 +149,13 @@ def parallel_sort_context_batch(community_df, max_context_tokens, parallel=False
        # Assign context strings directly to the DataFrame
        community_df[schemas.CONTEXT_STRING] = community_df[schemas.ALL_CONTEXT].apply(
            lambda context_list: sort_context(
-                context_list, max_context_tokens=max_context_tokens
+                context_list, tokenizer, max_context_tokens=max_context_tokens
            )
        )

    # Calculate other columns
    community_df[schemas.CONTEXT_SIZE] = community_df[schemas.CONTEXT_STRING].apply(
-        num_tokens
+        tokenizer.num_tokens
    )
    community_df[schemas.CONTEXT_EXCEED_FLAG] = (
        community_df[schemas.CONTEXT_SIZE] > max_context_tokens
--- a/graphrag/index/operations/summarize_communities/summarize_communities.py
+++ b/graphrag/index/operations/summarize_communities/summarize_communities.py
@ -23,6 +23,7 @@ from graphrag.index.operations.summarize_communities.utils import (
 )
 from graphrag.index.utils.derive_from_rows import derive_from_rows
 from graphrag.logger.progress import progress_ticker
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)

@ -35,6 +36,7 @@ async def summarize_communities(
    callbacks: WorkflowCallbacks,
    cache: PipelineCache,
    strategy: dict,
+    tokenizer: Tokenizer,
    max_input_length: int,
    async_mode: AsyncType = AsyncType.AsyncIO,
    num_threads: int = 4,
@ -44,7 +46,6 @@ async def summarize_communities(
    tick = progress_ticker(callbacks.progress, len(local_contexts))
    strategy_exec = load_strategy(strategy["type"])
    strategy_config = {**strategy}
-
    community_hierarchy = (
        communities.explode("children")
        .rename({"children": "sub_community"}, axis=1)
@ -60,6 +61,7 @@ async def summarize_communities(
            community_hierarchy_df=community_hierarchy,
            local_context_df=local_contexts,
            level=level,
+            tokenizer=tokenizer,
            max_context_tokens=max_input_length,
        )
        level_contexts.append(level_context)
--- a/graphrag/index/operations/summarize_communities/text_unit_context/context_builder.py
+++ b/graphrag/index/operations/summarize_communities/text_unit_context/context_builder.py
@ -18,7 +18,7 @@ from graphrag.index.operations.summarize_communities.text_unit_context.prep_text
 from graphrag.index.operations.summarize_communities.text_unit_context.sort_context import (
    sort_context,
 )
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)

@ -27,6 +27,7 @@ def build_local_context(
    community_membership_df: pd.DataFrame,
    text_units_df: pd.DataFrame,
    node_df: pd.DataFrame,
+    tokenizer: Tokenizer,
    max_context_tokens: int = 16000,
 ) -> pd.DataFrame:
    """
@ -69,10 +70,10 @@ def build_local_context(
        .reset_index()
    )
    context_df[schemas.CONTEXT_STRING] = context_df[schemas.ALL_CONTEXT].apply(
-        lambda x: sort_context(x)
+        lambda x: sort_context(x, tokenizer)
    )
    context_df[schemas.CONTEXT_SIZE] = context_df[schemas.CONTEXT_STRING].apply(
-        lambda x: num_tokens(x)
+        lambda x: tokenizer.num_tokens(x)
    )
    context_df[schemas.CONTEXT_EXCEED_FLAG] = context_df[schemas.CONTEXT_SIZE].apply(
        lambda x: x > max_context_tokens
@ -86,6 +87,7 @@ def build_level_context(
    community_hierarchy_df: pd.DataFrame,
    local_context_df: pd.DataFrame,
    level: int,
+    tokenizer: Tokenizer,
    max_context_tokens: int = 16000,
 ) -> pd.DataFrame:
    """
@ -116,10 +118,12 @@ def build_level_context(

        invalid_context_df.loc[:, [schemas.CONTEXT_STRING]] = invalid_context_df[
            schemas.ALL_CONTEXT
-        ].apply(lambda x: sort_context(x, max_context_tokens=max_context_tokens))
+        ].apply(
+            lambda x: sort_context(x, tokenizer, max_context_tokens=max_context_tokens)
+        )
        invalid_context_df.loc[:, [schemas.CONTEXT_SIZE]] = invalid_context_df[
            schemas.CONTEXT_STRING
-        ].apply(lambda x: num_tokens(x))
+        ].apply(lambda x: tokenizer.num_tokens(x))
        invalid_context_df.loc[:, [schemas.CONTEXT_EXCEED_FLAG]] = False

        return pd.concat([valid_context_df, invalid_context_df])
@ -199,10 +203,10 @@ def build_level_context(
        .reset_index()
    )
    community_df[schemas.CONTEXT_STRING] = community_df[schemas.ALL_CONTEXT].apply(
-        lambda x: build_mixed_context(x, max_context_tokens)
+        lambda x: build_mixed_context(x, tokenizer, max_context_tokens)
    )
    community_df[schemas.CONTEXT_SIZE] = community_df[schemas.CONTEXT_STRING].apply(
-        lambda x: num_tokens(x)
+        lambda x: tokenizer.num_tokens(x)
    )
    community_df[schemas.CONTEXT_EXCEED_FLAG] = False
    community_df[schemas.COMMUNITY_LEVEL] = level
@ -220,10 +224,10 @@ def build_level_context(
    )
    remaining_df[schemas.CONTEXT_STRING] = cast(
        "pd.DataFrame", remaining_df[schemas.ALL_CONTEXT]
-    ).apply(lambda x: sort_context(x, max_context_tokens=max_context_tokens))
+    ).apply(lambda x: sort_context(x, tokenizer, max_context_tokens=max_context_tokens))
    remaining_df[schemas.CONTEXT_SIZE] = cast(
        "pd.DataFrame", remaining_df[schemas.CONTEXT_STRING]
-    ).apply(lambda x: num_tokens(x))
+    ).apply(lambda x: tokenizer.num_tokens(x))
    remaining_df[schemas.CONTEXT_EXCEED_FLAG] = False

    return cast(
--- a/graphrag/index/operations/summarize_communities/text_unit_context/sort_context.py
+++ b/graphrag/index/operations/summarize_communities/text_unit_context/sort_context.py
@ -8,7 +8,7 @@ import logging
 import pandas as pd

 import graphrag.data_model.schemas as schemas
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)

@ -57,6 +57,7 @@ def get_context_string(

 def sort_context(
    local_context: list[dict],
+    tokenizer: Tokenizer,
    sub_community_reports: list[dict] | None = None,
    max_context_tokens: int | None = None,
 ) -> str:
@ -73,7 +74,7 @@ def sort_context(
            new_context_string = get_context_string(
                current_text_units, sub_community_reports
            )
-            if num_tokens(new_context_string) > max_context_tokens:
+            if tokenizer.num_tokens(new_context_string) > max_context_tokens:
                break

            context_string = new_context_string
--- a/graphrag/index/operations/summarize_descriptions/description_summary_extractor.py
+++ b/graphrag/index/operations/summarize_descriptions/description_summary_extractor.py
@ -7,9 +7,9 @@ import json
 from dataclasses import dataclass

 from graphrag.index.typing.error_handler import ErrorHandlerFn
-from graphrag.index.utils.tokens import num_tokens_from_string
 from graphrag.language_model.protocol.base import ChatModel
 from graphrag.prompts.index.summarize_descriptions import SUMMARIZE_PROMPT
+from graphrag.tokenizer.get_tokenizer import get_tokenizer

 # these tokens are used in the prompt
 ENTITY_NAME_KEY = "entity_name"
@ -45,7 +45,7 @@ class SummarizeExtractor:
        """Init method definition."""
        # TODO: streamline construction
        self._model = model_invoker
-
+        self._tokenizer = get_tokenizer(model_invoker.config)
        self._summarization_prompt = summarization_prompt or SUMMARIZE_PROMPT
        self._on_error = on_error or (lambda _e, _s, _d: None)
        self._max_summary_length = max_summary_length
@ -85,14 +85,14 @@ class SummarizeExtractor:
            descriptions = sorted(descriptions)

        # Iterate over descriptions, adding all until the max input tokens is reached
-        usable_tokens = self._max_input_tokens - num_tokens_from_string(
+        usable_tokens = self._max_input_tokens - self._tokenizer.num_tokens(
            self._summarization_prompt
        )
        descriptions_collected = []
        result = ""

        for i, description in enumerate(descriptions):
-            usable_tokens -= num_tokens_from_string(description)
+            usable_tokens -= self._tokenizer.num_tokens(description)
            descriptions_collected.append(description)

            # If buffer is full, or all descriptions have been added, summarize
@ -109,8 +109,8 @@ class SummarizeExtractor:
                    descriptions_collected = [result]
                    usable_tokens = (
                        self._max_input_tokens
-                        - num_tokens_from_string(self._summarization_prompt)
-                        - num_tokens_from_string(result)
+                        - self._tokenizer.num_tokens(self._summarization_prompt)
+                        - self._tokenizer.num_tokens(result)
                    )

        return result
--- a/graphrag/index/text_splitting/text_splitting.py
+++ b/graphrag/index/text_splitting/text_splitting.py
@ -94,7 +94,7 @@ class TokenTextSplitter(TextSplitter):

    def num_tokens(self, text: str) -> int:
        """Return the number of tokens in a string."""
-        return len(self._tokenizer.encode(text))
+        return self._tokenizer.num_tokens(text)

    def split_text(self, text: str | list[str]) -> list[str]:
        """Split text method."""
--- a/graphrag/index/utils/tokens.py
+++ b/graphrag/index/utils/tokens.py
@ -1,44 +0,0 @@
-# Copyright (c) 2024 Microsoft Corporation.
-# Licensed under the MIT License
-
-"""Utilities for working with tokens."""
-
-import logging
-
-import tiktoken
-
-import graphrag.config.defaults as defs
-
-DEFAULT_ENCODING_NAME = defs.ENCODING_MODEL
-
-logger = logging.getLogger(__name__)
-
-
-def num_tokens_from_string(
-    string: str, model: str | None = None, encoding_name: str | None = None
-) -> int:
-    """Return the number of tokens in a text string."""
-    if model is not None:
-        try:
-            encoding = tiktoken.encoding_for_model(model)
-        except KeyError:
-            msg = f"Failed to get encoding for {model} when getting num_tokens_from_string. Fall back to default encoding {DEFAULT_ENCODING_NAME}"
-            logger.warning(msg)
-            encoding = tiktoken.get_encoding(DEFAULT_ENCODING_NAME)
-    else:
-        encoding = tiktoken.get_encoding(encoding_name or DEFAULT_ENCODING_NAME)
-    return len(encoding.encode(string))
-
-
-def string_from_tokens(
-    tokens: list[int], model: str | None = None, encoding_name: str | None = None
-) -> str:
-    """Return a text string from a list of tokens."""
-    if model is not None:
-        encoding = tiktoken.encoding_for_model(model)
-    elif encoding_name is not None:
-        encoding = tiktoken.get_encoding(encoding_name)
-    else:
-        msg = "Either model or encoding_name must be specified."
-        raise ValueError(msg)
-    return encoding.decode(tokens)
--- a/graphrag/index/validate_config.py
+++ b/graphrag/index/validate_config.py
@ -15,42 +15,39 @@ logger = logging.getLogger(__name__)


 def validate_config_names(parameters: GraphRagConfig) -> None:
-    """Validate config file for LLM deployment name typos."""
-    # Validate Chat LLM configs
-    # TODO: Replace default_chat_model with a way to select the model
-    default_llm_settings = parameters.get_language_model_config("default_chat_model")
-
-    llm = ModelManager().register_chat(
-        name="test-llm",
-        model_type=default_llm_settings.type,
-        config=default_llm_settings,
-        callbacks=NoopWorkflowCallbacks(),
-        cache=None,
-    )
-
-    try:
-        asyncio.run(llm.achat("This is an LLM connectivity test. Say Hello World"))
-        logger.info("LLM Config Params Validated")
-    except Exception as e:  # noqa: BLE001
-        logger.error(f"LLM configuration error detected. Exiting...\n{e}")  # noqa
-        sys.exit(1)
-
-    # Validate Embeddings LLM configs
-    embedding_llm_settings = parameters.get_language_model_config(
-        parameters.embed_text.model_id
-    )
-
-    embed_llm = ModelManager().register_embedding(
-        name="test-embed-llm",
-        model_type=embedding_llm_settings.type,
-        config=embedding_llm_settings,
-        callbacks=NoopWorkflowCallbacks(),
-        cache=None,
-    )
-
-    try:
-        asyncio.run(embed_llm.aembed_batch(["This is an LLM Embedding Test String"]))
-        logger.info("Embedding LLM Config Params Validated")
-    except Exception as e:  # noqa: BLE001
-        logger.error(f"Embedding LLM configuration error detected. Exiting...\n{e}")  # noqa
-        sys.exit(1)
+    """Validate config file for model deployment name typos, by running a quick test message for each."""
+    for id, config in parameters.models.items():
+        if config.type in ["chat", "azure_openai", "openai"]:
+            llm = ModelManager().register_chat(
+                name="test-llm",
+                model_type=config.type,
+                config=config,
+                callbacks=NoopWorkflowCallbacks(),
+                cache=None,
+            )
+            try:
+                asyncio.run(
+                    llm.achat("This is an LLM connectivity test. Say Hello World")
+                )
+                logger.info("LLM Config Params Validated")
+            except Exception as e:  # noqa: BLE001
+                logger.error(f"LLM configuration error detected.\n{e}")  # noqa
+                print(f"Failed to validate language model ({id}) params", e)  # noqa: T201
+                sys.exit(1)
+        elif config.type in ["embedding", "azure_openai_embedding", "openai_embedding"]:
+            embed_llm = ModelManager().register_embedding(
+                name="test-embed-llm",
+                model_type=config.type,
+                config=config,
+                callbacks=NoopWorkflowCallbacks(),
+                cache=None,
+            )
+            try:
+                asyncio.run(
+                    embed_llm.aembed_batch(["This is an LLM Embedding Test String"])
+                )
+                logger.info("Embedding LLM Config Params Validated")
+            except Exception as e:  # noqa: BLE001
+                logger.error(f"Embedding configuration error detected.\n{e}")  # noqa
+                print(f"Failed to validate embedding model ({id}) params", e)  # noqa: T201
+                sys.exit(1)
--- a/graphrag/index/workflows/create_community_reports.py
+++ b/graphrag/index/workflows/create_community_reports.py
@ -13,6 +13,7 @@ from graphrag.callbacks.workflow_callbacks import WorkflowCallbacks
 from graphrag.config.defaults import graphrag_config_defaults
 from graphrag.config.enums import AsyncType
 from graphrag.config.models.graph_rag_config import GraphRagConfig
+from graphrag.config.models.language_model_config import LanguageModelConfig
 from graphrag.index.operations.finalize_community_reports import (
    finalize_community_reports,
 )
@ -28,6 +29,7 @@ from graphrag.index.operations.summarize_communities.summarize_communities impor
 )
 from graphrag.index.typing.context import PipelineRunContext
 from graphrag.index.typing.workflow import WorkflowFunctionOutput
+from graphrag.tokenizer.get_tokenizer import get_tokenizer
 from graphrag.utils.storage import (
    load_table_from_storage,
    storage_has_table,
@ -102,6 +104,9 @@ async def create_community_reports(

    summarization_strategy["extraction_prompt"] = summarization_strategy["graph_prompt"]

+    model_config = LanguageModelConfig(**summarization_strategy["llm"])
+    tokenizer = get_tokenizer(model_config)
+
    max_input_length = summarization_strategy.get(
        "max_input_length", graphrag_config_defaults.community_reports.max_input_length
    )
@ -110,6 +115,7 @@ async def create_community_reports(
        nodes,
        edges,
        claims,
+        tokenizer,
        callbacks,
        max_input_length,
    )
@ -122,6 +128,7 @@ async def create_community_reports(
        callbacks,
        cache,
        summarization_strategy,
+        tokenizer=tokenizer,
        max_input_length=max_input_length,
        async_mode=async_mode,
        num_threads=num_threads,
--- a/graphrag/index/workflows/create_community_reports_text.py
+++ b/graphrag/index/workflows/create_community_reports_text.py
@ -12,6 +12,7 @@ from graphrag.callbacks.workflow_callbacks import WorkflowCallbacks
 from graphrag.config.defaults import graphrag_config_defaults
 from graphrag.config.enums import AsyncType
 from graphrag.config.models.graph_rag_config import GraphRagConfig
+from graphrag.config.models.language_model_config import LanguageModelConfig
 from graphrag.index.operations.finalize_community_reports import (
    finalize_community_reports,
 )
@ -27,6 +28,7 @@ from graphrag.index.operations.summarize_communities.text_unit_context.context_b
 )
 from graphrag.index.typing.context import PipelineRunContext
 from graphrag.index.typing.workflow import WorkflowFunctionOutput
+from graphrag.tokenizer.get_tokenizer import get_tokenizer
 from graphrag.utils.storage import load_table_from_storage, write_table_to_storage

 logger = logging.getLogger(__name__)
@ -88,8 +90,11 @@ async def create_community_reports_text(
        "max_input_length", graphrag_config_defaults.community_reports.max_input_length
    )

+    model_config = LanguageModelConfig(**summarization_strategy["llm"])
+    tokenizer = get_tokenizer(model_config)
+
    local_contexts = build_local_context(
-        communities, text_units, nodes, max_input_length
+        communities, text_units, nodes, tokenizer, max_input_length
    )

    community_reports = await summarize_communities(
@ -100,6 +105,7 @@ async def create_community_reports_text(
        callbacks,
        cache,
        summarization_strategy,
+        tokenizer=tokenizer,
        max_input_length=max_input_length,
        async_mode=async_mode,
        num_threads=num_threads,
--- a/graphrag/language_model/providers/litellm/chat_model.py
+++ b/graphrag/language_model/providers/litellm/chat_model.py
@ -86,9 +86,10 @@ def _create_base_completions(
            msg = "Azure Managed Identity authentication is only supported for Azure models."
            raise ValueError(msg)

+        base_args["azure_scope"] = base_args.pop("audience")
        base_args["azure_ad_token_provider"] = get_bearer_token_provider(
            DefaultAzureCredential(),
-            COGNITIVE_SERVICES_AUDIENCE,
+            model_config.audience or COGNITIVE_SERVICES_AUDIENCE,
        )

    def _base_completion(**kwargs: Any) -> ModelResponse | CustomStreamWrapper:
--- a/graphrag/language_model/providers/litellm/embedding_model.py
+++ b/graphrag/language_model/providers/litellm/embedding_model.py
@ -72,9 +72,10 @@ def _create_base_embeddings(
            msg = "Azure Managed Identity authentication is only supported for Azure models."
            raise ValueError(msg)

+        base_args["azure_scope"] = base_args.pop("audience")
        base_args["azure_ad_token_provider"] = get_bearer_token_provider(
            DefaultAzureCredential(),
-            COGNITIVE_SERVICES_AUDIENCE,
+            model_config.audience or COGNITIVE_SERVICES_AUDIENCE,
        )

    def _base_embedding(**kwargs: Any) -> EmbeddingResponse:
--- a/graphrag/language_model/providers/litellm/request_wrappers/with_retries.py
+++ b/graphrag/language_model/providers/litellm/request_wrappers/with_retries.py
@ -39,7 +39,7 @@ def with_retries(
    retry_factory = RetryFactory()
    retry_service = retry_factory.create(
        strategy=model_config.retry_strategy,
-        max_attempts=model_config.max_retries,
+        max_retries=model_config.max_retries,
        max_retry_wait=model_config.max_retry_wait,
    )

--- a/graphrag/language_model/providers/litellm/services/retry/exponential_retry.py
+++ b/graphrag/language_model/providers/litellm/services/retry/exponential_retry.py
@ -21,20 +21,20 @@ class ExponentialRetry(Retry):
    def __init__(
        self,
        *,
-        max_attempts: int = 5,
+        max_retries: int = 5,
        base_delay: float = 2.0,
        jitter: bool = True,
        **kwargs: Any,
    ):
-        if max_attempts <= 0:
-            msg = "max_attempts must be greater than 0."
+        if max_retries <= 0:
+            msg = "max_retries must be greater than 0."
            raise ValueError(msg)

        if base_delay <= 1.0:
            msg = "base_delay must be greater than 1.0."
            raise ValueError(msg)

-        self._max_attempts = max_attempts
+        self._max_retries = max_retries
        self._base_delay = base_delay
        self._jitter = jitter

@ -46,15 +46,15 @@ class ExponentialRetry(Retry):
            try:
                return func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"ExponentialRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"ExponentialRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay *= self._base_delay
                logger.exception(
-                    f"ExponentialRetry: Request failed, retrying, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"ExponentialRetry: Request failed, retrying, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                time.sleep(delay + (self._jitter * random.uniform(0, 1)))  # noqa: S311

@ -70,14 +70,14 @@ class ExponentialRetry(Retry):
            try:
                return await func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"ExponentialRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"ExponentialRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay *= self._base_delay
                logger.exception(
-                    f"ExponentialRetry: Request failed, retrying, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"ExponentialRetry: Request failed, retrying, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                await asyncio.sleep(delay + (self._jitter * random.uniform(0, 1)))  # noqa: S311
--- a/graphrag/language_model/providers/litellm/services/retry/incremental_wait_retry.py
+++ b/graphrag/language_model/providers/litellm/services/retry/incremental_wait_retry.py
@ -21,16 +21,20 @@ class IncrementalWaitRetry(Retry):
        self,
        *,
        max_retry_wait: float,
-        max_attempts: int = 5,
+        max_retries: int = 5,
        **kwargs: Any,
    ):
-        if max_attempts <= 0 or max_retry_wait <= 0:
-            msg = "max_attempts and max_retry_wait must be greater than 0."
+        if max_retries <= 0:
+            msg = "max_retries must be greater than 0."
            raise ValueError(msg)

-        self._max_attempts = max_attempts
+        if max_retry_wait <= 0:
+            msg = "max_retry_wait must be greater than 0."
+            raise ValueError(msg)
+
+        self._max_retries = max_retries
        self._max_retry_wait = max_retry_wait
-        self._increment = max_retry_wait / max_attempts
+        self._increment = max_retry_wait / max_retries

    def retry(self, func: Callable[..., Any], **kwargs: Any) -> Any:
        """Retry a synchronous function."""
@ -40,15 +44,15 @@ class IncrementalWaitRetry(Retry):
            try:
                return func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"IncrementalWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"IncrementalWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay += self._increment
                logger.exception(
-                    f"IncrementalWaitRetry: Request failed, retrying after incremental delay, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"IncrementalWaitRetry: Request failed, retrying after incremental delay, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                time.sleep(delay)

@ -64,14 +68,14 @@ class IncrementalWaitRetry(Retry):
            try:
                return await func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"IncrementalWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"IncrementalWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay += self._increment
                logger.exception(
-                    f"IncrementalWaitRetry: Request failed, retrying after incremental delay, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"IncrementalWaitRetry: Request failed, retrying after incremental delay, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                await asyncio.sleep(delay)
--- a/graphrag/language_model/providers/litellm/services/retry/native_wait_retry.py
+++ b/graphrag/language_model/providers/litellm/services/retry/native_wait_retry.py
@ -18,14 +18,14 @@ class NativeRetry(Retry):
    def __init__(
        self,
        *,
-        max_attempts: int = 5,
+        max_retries: int = 5,
        **kwargs: Any,
    ):
-        if max_attempts <= 0:
-            msg = "max_attempts must be greater than 0."
+        if max_retries <= 0:
+            msg = "max_retries must be greater than 0."
            raise ValueError(msg)

-        self._max_attempts = max_attempts
+        self._max_retries = max_retries

    def retry(self, func: Callable[..., Any], **kwargs: Any) -> Any:
        """Retry a synchronous function."""
@ -34,14 +34,14 @@ class NativeRetry(Retry):
            try:
                return func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"NativeRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"NativeRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                logger.exception(
-                    f"NativeRetry: Request failed, immediately retrying, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"NativeRetry: Request failed, immediately retrying, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )

    async def aretry(
@ -55,12 +55,12 @@ class NativeRetry(Retry):
            try:
                return await func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"NativeRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"NativeRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                logger.exception(
-                    f"NativeRetry: Request failed, immediately retrying, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"NativeRetry: Request failed, immediately retrying, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
--- a/graphrag/language_model/providers/litellm/services/retry/random_wait_retry.py
+++ b/graphrag/language_model/providers/litellm/services/retry/random_wait_retry.py
@ -22,14 +22,18 @@ class RandomWaitRetry(Retry):
        self,
        *,
        max_retry_wait: float,
-        max_attempts: int = 5,
+        max_retries: int = 5,
        **kwargs: Any,
    ):
-        if max_attempts <= 0 or max_retry_wait <= 0:
-            msg = "max_attempts and max_retry_wait must be greater than 0."
+        if max_retries <= 0:
+            msg = "max_retries must be greater than 0."
            raise ValueError(msg)

-        self._max_attempts = max_attempts
+        if max_retry_wait <= 0:
+            msg = "max_retry_wait must be greater than 0."
+            raise ValueError(msg)
+
+        self._max_retries = max_retries
        self._max_retry_wait = max_retry_wait

    def retry(self, func: Callable[..., Any], **kwargs: Any) -> Any:
@ -39,15 +43,15 @@ class RandomWaitRetry(Retry):
            try:
                return func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"RandomWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"RandomWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay = random.uniform(0, self._max_retry_wait)  # noqa: S311
                logger.exception(
-                    f"RandomWaitRetry: Request failed, retrying after random delay, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"RandomWaitRetry: Request failed, retrying after random delay, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                time.sleep(delay)

@ -62,14 +66,14 @@ class RandomWaitRetry(Retry):
            try:
                return await func(**kwargs)
            except Exception as e:
-                if retries >= self._max_attempts:
+                if retries >= self._max_retries:
                    logger.exception(
-                        f"RandomWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                        f"RandomWaitRetry: Max retries exceeded, retries={retries}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                    )
                    raise
                retries += 1
                delay = random.uniform(0, self._max_retry_wait)  # noqa: S311
                logger.exception(
-                    f"RandomWaitRetry: Request failed, retrying after random delay, retries={retries}, delay={delay}, max_retries={self._max_attempts}, exception={e}",  # noqa: G004, TRY401
+                    f"RandomWaitRetry: Request failed, retrying after random delay, retries={retries}, delay={delay}, max_retries={self._max_retries}, exception={e}",  # noqa: G004, TRY401
                )
                await asyncio.sleep(delay)
--- a/graphrag/logger/progress.py
+++ b/graphrag/logger/progress.py
@ -56,9 +56,7 @@ class ProgressTicker:
                description=self._description,
            )
            if p.description:
-                logger.info(
-                    "%s%s/%s", p.description, str(p.completed_items), str(p.total_items)
-                )
+                logger.info("%s%s/%s", p.description, p.completed_items, p.total_items)
            self._callback(p)

    def done(self) -> None:
--- a/graphrag/query/llm/text_utils.py
+++ b/graphrag/query/llm/text_utils.py
@ -9,21 +9,15 @@ import re
 from collections.abc import Iterator
 from itertools import islice

-import tiktoken
 from json_repair import repair_json

 import graphrag.config.defaults as defs
+from graphrag.tokenizer.get_tokenizer import get_tokenizer
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)


-def num_tokens(text: str, token_encoder: tiktoken.Encoding | None = None) -> int:
-    """Return the number of tokens in the given text."""
-    if token_encoder is None:
-        token_encoder = tiktoken.get_encoding(defs.ENCODING_MODEL)
-    return len(token_encoder.encode(text))  # type: ignore
-
-
 def batched(iterable: Iterator, n: int):
    """
    Batch data into tuples of length n. The last batch may be shorter.
@ -39,15 +33,13 @@ def batched(iterable: Iterator, n: int):
        yield batch


-def chunk_text(
-    text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
-):
+def chunk_text(text: str, max_tokens: int, tokenizer: Tokenizer | None = None):
    """Chunk text by token length."""
-    if token_encoder is None:
-        token_encoder = tiktoken.get_encoding(defs.ENCODING_MODEL)
-    tokens = token_encoder.encode(text)  # type: ignore
+    if tokenizer is None:
+        tokenizer = get_tokenizer(encoding_model=defs.ENCODING_MODEL)
+    tokens = tokenizer.encode(text)  # type: ignore
    chunk_iterator = batched(iter(tokens), max_tokens)
-    yield from (token_encoder.decode(list(chunk)) for chunk in chunk_iterator)
+    yield from (tokenizer.decode(list(chunk)) for chunk in chunk_iterator)


 def try_parse_json_object(input: str, verbose: bool = True) -> tuple[str, dict]:
--- a/graphrag/query/question_gen/base.py
+++ b/graphrag/query/question_gen/base.py
@ -7,13 +7,13 @@ from abc import ABC, abstractmethod
 from dataclasses import dataclass
 from typing import Any

-import tiktoken
-
 from graphrag.language_model.protocol.base import ChatModel
 from graphrag.query.context_builder.builders import (
    GlobalContextBuilder,
    LocalContextBuilder,
 )
+from graphrag.tokenizer.get_tokenizer import get_tokenizer
+from graphrag.tokenizer.tokenizer import Tokenizer


@dataclass
@ -34,13 +34,13 @@ class BaseQuestionGen(ABC):
        self,
        model: ChatModel,
        context_builder: GlobalContextBuilder | LocalContextBuilder,
-        token_encoder: tiktoken.Encoding | None = None,
+        tokenizer: Tokenizer | None = None,
        model_params: dict[str, Any] | None = None,
        context_builder_params: dict[str, Any] | None = None,
    ):
        self.model = model
        self.context_builder = context_builder
-        self.token_encoder = token_encoder
+        self.tokenizer = tokenizer or get_tokenizer(model.config)
        self.model_params = model_params or {}
        self.context_builder_params = context_builder_params or {}

--- a/graphrag/query/question_gen/local_gen.py
+++ b/graphrag/query/question_gen/local_gen.py
@ -7,8 +7,6 @@ import logging
 import time
 from typing import Any, cast

-import tiktoken
-
 from graphrag.callbacks.llm_callbacks import BaseLLMCallback
 from graphrag.language_model.protocol.base import ChatModel
 from graphrag.prompts.query.question_gen_system_prompt import QUESTION_SYSTEM_PROMPT
@ -19,8 +17,8 @@ from graphrag.query.context_builder.builders import (
 from graphrag.query.context_builder.conversation_history import (
    ConversationHistory,
 )
-from graphrag.query.llm.text_utils import num_tokens
 from graphrag.query.question_gen.base import BaseQuestionGen, QuestionResult
+from graphrag.tokenizer.tokenizer import Tokenizer

 logger = logging.getLogger(__name__)

@ -32,7 +30,7 @@ class LocalQuestionGen(BaseQuestionGen):
        self,
        model: ChatModel,
        context_builder: LocalContextBuilder,
-        token_encoder: tiktoken.Encoding | None = None,
+        tokenizer: Tokenizer | None = None,
        system_prompt: str = QUESTION_SYSTEM_PROMPT,
        callbacks: list[BaseLLMCallback] | None = None,
        model_params: dict[str, Any] | None = None,
@ -41,7 +39,7 @@ class LocalQuestionGen(BaseQuestionGen):
        super().__init__(
            model=model,
            context_builder=context_builder,
-            token_encoder=token_encoder,
+            tokenizer=tokenizer,
            model_params=model_params,
            context_builder_params=context_builder_params,
        )
@ -118,7 +116,7 @@ class LocalQuestionGen(BaseQuestionGen):
                },
                completion_time=time.time() - start_time,
                llm_calls=1,
-                prompt_tokens=num_tokens(system_prompt, self.token_encoder),
+                prompt_tokens=self.tokenizer.num_tokens(system_prompt),
            )

        except Exception:
@ -128,7 +126,7 @@ class LocalQuestionGen(BaseQuestionGen):
                context_data=context_records,
                completion_time=time.time() - start_time,
                llm_calls=1,
-                prompt_tokens=num_tokens(system_prompt, self.token_encoder),
+                prompt_tokens=self.tokenizer.num_tokens(system_prompt),
            )

    async def generate(
@ -201,7 +199,7 @@ class LocalQuestionGen(BaseQuestionGen):
                },
                completion_time=time.time() - start_time,
                llm_calls=1,
-                prompt_tokens=num_tokens(system_prompt, self.token_encoder),
+                prompt_tokens=self.tokenizer.num_tokens(system_prompt),
            )

        except Exception:
@ -211,5 +209,5 @@ class LocalQuestionGen(BaseQuestionGen):
                context_data=context_records,
                completion_time=time.time() - start_time,
                llm_calls=1,
-                prompt_tokens=num_tokens(system_prompt, self.token_encoder),
+                prompt_tokens=self.tokenizer.num_tokens(system_prompt),
            )
--- a/graphrag/tokenizer/get_tokenizer.py
+++ b/graphrag/tokenizer/get_tokenizer.py
@ -3,19 +3,15 @@

 """Get Tokenizer."""

-from typing import TYPE_CHECKING
-
 from graphrag.config.defaults import ENCODING_MODEL
+from graphrag.config.models.language_model_config import LanguageModelConfig
 from graphrag.tokenizer.litellm_tokenizer import LitellmTokenizer
 from graphrag.tokenizer.tiktoken_tokenizer import TiktokenTokenizer
 from graphrag.tokenizer.tokenizer import Tokenizer

-if TYPE_CHECKING:
-    from graphrag.config.models.language_model_config import LanguageModelConfig
-

 def get_tokenizer(
-    model_config: "LanguageModelConfig | None" = None,
+    model_config: LanguageModelConfig | None = None,
    encoding_model: str = ENCODING_MODEL,
 ) -> Tokenizer:
    """
--- a/graphrag/utils/api.py
+++ b/graphrag/utils/api.py
@ -130,7 +130,7 @@ def get_embedding_store(
        embedding_store = VectorStoreFactory().create_vector_store(
            vector_store_type=vector_store_type,
            vector_store_schema_config=single_embedding_config,
-            kwargs={**store},
+            **store,
        )
        embedding_store.connect(**store)
        # If there is only a single index, return the embedding store directly
--- a/graphrag/vector_stores/factory.py
+++ b/graphrag/vector_stores/factory.py
@ -53,7 +53,7 @@ class VectorStoreFactory:
        cls,
        vector_store_type: str,
        vector_store_schema_config: VectorStoreSchemaConfig,
-        kwargs: dict,
+        **kwargs: dict,
    ) -> BaseVectorStore:
        """Create a vector store object from the provided type.

--- a/mkdocs.yaml
+++ b/mkdocs.yaml
@ -22,11 +22,12 @@ theme:

 nav:
  - Home:
-      - Welcome: index.md
-      - Getting Started: get_started.md
-      - Development Guide: developing.md
+      - Welcome: "index.md"
+      - Getting Started: "get_started.md"
+      - Development Guide: "developing.md"
  - Indexing:
      - Overview: "index/overview.md"
+      - Architecture: "index/architecture.md"
      - Dataflow: "index/default_dataflow.md"
      - Methods: "index/methods.md"
      - Inputs: "index/inputs.md"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,7 +1,7 @@
 [project]
 name = "graphrag"
 # Maintainers: do not change the version here manually, use ./scripts/release.sh
-version = "2.6.0"
+version = "2.7.0"
 description = "GraphRAG: A graph-based retrieval-augmented generation (RAG) system."
 authors = [
    {name = "Alonso Guevara Fernández", email = "alonsog@microsoft.com"},
@ -69,7 +69,7 @@ dependencies = [
    "litellm>=1.77.1",
 ]

-[project.optional-dependencies]
+[dependency-groups]
 dev = [
    "coverage>=7.6.9",
    "ipykernel>=6.29.5",
@ -116,7 +116,7 @@ _convert_local_search_nb = 'jupyter nbconvert --output-dir=docsite/posts/query/n
 _convert_global_search_nb = 'jupyter nbconvert --output-dir=docsite/posts/query/notebooks/ --output="{notebook_name}_nb" --template=docsite/nbdocsite_template --to markdown examples_notebooks/global_search.ipynb'
 _semversioner_release = "semversioner release"
 _semversioner_changelog = "semversioner changelog > CHANGELOG.md"
-_semversioner_update_toml_version = "update-toml update --path project.version --value $(semversioner current-version)"
+_semversioner_update_toml_version = "update-toml update --path project.version --value $(uv run semversioner current-version)"
 semversioner_add = "semversioner add-change"
 coverage_report = 'coverage report --omit "**/tests/**" --show-missing'
 check_format = 'ruff format . --check'
@ -239,6 +239,7 @@ ignore = [
    "PERF203", # Needs restructuring of errors, we should bail-out on first error
    "C901",    # needs refactoring to remove cyclomatic complexity
    "B008", # Needs to restructure our cli params with Typer into constants
+    "ASYNC240"
 ]

 [tool.ruff.lint.per-file-ignores]
--- a/tests/fixtures/min-csv/config.json
+++ b/tests/fixtures/min-csv/config.json
@ -8,7 +8,8 @@
        "create_base_text_units": {
            "max_runtime": 30
        },
-        "extract_covariates": {
+        "extract_covariates": 
+        {
            "max_runtime": 10
        },
        "extract_graph": {
@ -16,8 +17,8 @@
        },
        "finalize_graph": {
            "row_range": [
-                1,
-                500
+                100,
+                750
            ],
            "nan_allowed_columns": [
                "x",
@ -32,7 +33,7 @@
        "create_communities": {
            "row_range": [
                10,
-                35
+                80
            ],
            "max_runtime": 30,
            "expected_artifacts": ["communities.parquet"]
@ -40,7 +41,7 @@
        "create_community_reports": {
            "row_range": [
                10,
-                35
+                80
            ],
            "nan_allowed_columns": [
                "title",
--- a/tests/fixtures/min-csv/input/dulce.txt
+++ b/tests/fixtures/min-csv/input/dulce.txt
@ -1,185 +0,0 @@
-# Operation: Dulce
-
-## Chapter 1
-
-The thrumming of monitors cast a stark contrast to the rigid silence enveloping the group. Agent Alex Mercer, unfailingly determined on paper, seemed dwarfed by the enormity of the sterile briefing room where Paranormal Military Squad's elite convened. With dulled eyes, he scanned the projectors outlining their impending odyssey into Operation: Dulce.
-
-“I assume, Agent Mercer, you’re not having second thoughts?” It was Taylor Cruz’s voice, laced with an edge that demanded attention.
-
-Alex flickered a strained smile, still thumbing his folder's corner. "Of course not, Agent Cruz. Just trying to soak in all the details." The compliance in his tone was unsettling, even to himself.
-
-Jordan Hayes, perched on the opposite side of the table, narrowed their eyes but offered a supportive nod. "Details are imperative. We’ll need your clear-headedness down there, Mercer."
-
-A comfortable silence, the kind that threaded between veterans of shared secrets, lingered briefly before Sam Rivera, never one to submit to quiet, added, "I’ve combed through the last transmission logs. If anyone can make sense of the anomalies, it’s going to be the two of you."
-
-Taylor snorted dismissively. “Focus, people. We have protocols for a reason. Speculation is counter-productive.” The words 'counter-productive' seemed to hang in the air, a tacit reprimand directed at Alex.
-
-Feeling the weight of his compliance conflicting with his natural inclination to leave no stone unturned, Alex straightened in his seat. "I agree, Agent Cruz. Protocol is paramount," he said, meeting Taylor's steely gaze. It was an affirmation, but beneath it lay layers of unspoken complexities that would undoubtedly unwind with time.
-
-Alex's submission, though seemingly complete, didn't escape Jordan, who tilted their head ever so slightly, their eyes revealing a spark of understanding. They knew well enough the struggle of aligning personal convictions with overarching missions. As everyone began to collect their binders and prepare for departure, a quiet resolve took form within Alex, galvanized by the groundwork laid by their interactions. He may have spoken in compliance, but his determination had merely taken a subtler form — one that wouldn't surrender so easily to the forthcoming shadows.
-
-\*
-
-Dr. Jordan Hayes shuffled a stack of papers, their eyes revealing a tinge of skepticism at Taylor Cruz's authoritarian performance. _Protocols_, Jordan thought, _are just the framework, the true challenges we're about to face lie well beyond the boundaries of any protocol._ They cleared their throat before speaking, tone cautious yet firm, "Let's remember, the unknown variables exceed the known. We should remain adaptive."
-
-A murmur of agreement echoed from Sam Rivera, who leaned forward, lacing their fingers together as if weaving a digital framework in the air before them, "Exactly, adaptability could be the key to interpreting the signal distortions and system malfunctions. We shouldn't discount the… erratic."
-
-Their words hung like an electric charge in the room, challenging Taylor's position with an inherent truth. Cruz’s jaw tightened almost imperceptibly, but the agent masked it with a small nod, conceding to the omnipresent threat of the unpredictable. 
-
-Alex glanced at Jordan, who never looked back, their gaze fixed instead on a distant point, as if envisioning the immense dark corridors they were soon to navigate in Dulce. Jordan was not one to embrace fantastical theories, but the air of cautious calculation betrayed a mind bracing for confrontation with the inexplicable, an internal battle between the evidence of their research and the calculating skepticism that kept them alive in their field.
-
-The meeting adjourned with no further comments, the team members quietly retreading the paths to their personal preparations. Alex, trailing slightly behind, observed the others. _The cautious reserve Jordan wears like armor doesn't fool me_, he thought, _their analytical mind sees the patterns I do. And that's worth more than protocol. That's the connection we need to survive this._
-
-As the agents dispersed into the labyrinth of the facility, lost in their thoughts and preparations, the base's halogen lights flickered, a brief and unnoticed harbingers of the darkness to come.
-
-\*
-
-A deserted corridor inside the facility stretched before Taylor Cruz, each footstep rhythmic and precise. Cruz, ambitious and meticulous, eyed the troops passing by with a sardonic tilt of the lips. Obedience—it was as much a tool as any weapon in the arsenal, and Cruz wielded it masterfully. To them, it was another step toward unfettered power within the dark bowels of the military complex.
-
-Inside a secluded equipment bay, Cruz began checking over gear with mechanical efficiency. They traced fingers over the sleek surface of an encrypted radio transmitter. "If protocols are maintained," said Cruz aloud, rehearsing the speech for their subordinates, "not only will we re-establish a line of communication with Dulce, but we shall also illuminate the darkest secrets it conceals."
-
-Agent Hayes appeared in the doorway, arms crossed and a knowing glint in their eyes. "You do understand," Jordan began, the words measured and probing, "that once we're in the depths, rank gives way to survival instincts. It's not about commands—it's empowerment through trust."
-
-The sentiment snagged on Cruz's armor of confidence, probing at the insecurities festering beneath. Taylor offered a brief nod, perhaps too curt, but enough to acknowledge Jordan's point without yielding ground. "Trust," Cruz mused, "or the illusion thereof, is just as potent."
-
-Silence claimed the space between them, steeped in the reality of the unknown dangers lurking in the shadows of the mission. Cruz diligently returned to the equipment, the act a clear dismissal.
-
-Not much later, Cruz stood alone, the hollow echo of the bay a stark reminder of the isolation that power often wrought. With each checked box, their resolve steeled further, a silent vow to usher their team through the abyss—whatever it might hold—and emerge enshrined in the respect they so deeply craved.
-
-## Chapter 2
-
-Sam Rivera sat alone in a cramped office, the hum of a dozen servers murmuring a digital lullaby in the background. Surrounded by the glow of multiple screens, their eyes danced across lines of code and intercepted comm signals from Dulce — a kaleidoscope of data that their curious and isolated mind hungered to decrypt.
-
-To an outsider, it might have looked like obsession, this fervent quest for answers. But to Sam, it was a dance — a give and take with the mysteries of the universe. Their fingers paused over the keyboard as they leaned back in the chair, whispering to thin air, "What secrets are you hiding from us?"
-
-The stillness of the room broke with the unexpected arrival of Alex Mercer, whose encroaching shadow loomed over Sam's workspace. The cybersecurity expert craned their neck upwards, met by the ever-so-slight furrow in Alex's brow. "Got a minute, Rivera?"
-
-"Always," Sam said, a smile surfacing as they swiveled to face their mentor more directly. _He has that look — like something's not sitting right with him,_ they noted inwardly.
-
-Alex hesitated, weighing his words carefully. "Our tech is top-tier, but the silence from Dulce... It's not just technology that will see us through, it's intuition and... trust." His gaze pierced through the digital haze, trying to instill something more profound than advice.
-
-Sam regarded Alex for a moment, the sincerity in his voice resonating with their own unspoken desire to prove their worth. "Intuition," they mirrored thoughtfully. "I guess sometimes the numbers don't have all the answers."
-
-Their shared silence held a newfound understanding, a recognition that between the ones and zeros, it was their combined human insights that might prevail against the impossible. As Alex turned to leave, Sam's eyes drifted back to the screens, now seeing them not as barriers to isolate behind, but as windows into the vast and enigmatic challenge that awaited their team.
-
-Outside the office, the persistent buzz of activity in the facility belied the unease that gripped its inhabitants. A restlessness that nibbled on the edges of reality, as though forewarning of the threshold they were soon to cross — from the known into the realm of cosmic secrets and silent threats.
-
-\*
-
-Shadows played against the walls of the cramped underground meeting room, where Alex Mercer stood gazing at the concealed elevator that would deliver them into the bowels of Dulce base. The air was thick, every breath laced with the weight of impending confrontation, the kind one feels when stepping into a legend. Though armed with an array of advanced weaponry and gear, there was an unshakeable sense that they were delving into a conflict where the physical might be of little consequence.
-
-"I know what you're thinking," Jordan Hayes remarked, approaching Mercer. Their voice was low, a blend of confidence and hidden apprehension. "This feels like more than a rescue or reconnaissance mission, doesn't it?"
-
-Alex turned, his features a mask of uneasy resolve. "It's like we're being pulled into someone else’s game. Not just observers or participants, but... pawns."
-
-Jordan gave a short nod, their analytical mind colliding with the uncertain dynamics of this operation. "I've felt that way since the briefing. Like there's a layer we’re not seeing. And yet, we have no choice but to play along." Their eyes locked with Alex's, silently exchanging a vow to remain vigilant.
-
-"You two need to cut the philosophical chatter. We have positions to secure," Taylor Cruz interjected sharply, stepping into their exchange. The authority in Taylor's voice brooked no argument; it was their way of pulling everyone back to the now.
-
-Alex's response was measured, more assertive than moments ago. "Acknowledged, Agent Cruz," he replied, his voice steadier, mirroring the transformation brewing within. He gripped his rifle with a newfound firmness. "Let's proceed."
-
-As they congregated at the elevator, a tension palpable, Sam Rivera piped in with a tone of balanced levity, "Hope everyone’s brought their good luck charms. Something tells me we’re going to need all the help we can get."
-
-Their laughter served as a brief respite from the gravity of their mission, a shared moment that reinforced their common humanity amidst the unknowable. Then, as one, they stepped into the elevator. The doors closed with a silent hiss, and they descended into the darkness together, aware that when they returned, if they returned, none of them would be the same.
-
-\*
-
-The sense of foreboding hung heavier than the darkness that the artificial lights of the elevator shaft failed to fully penetrate. The team was descending into the earth, carrying with them not only the weight of their equipment but also the silent pressure of the invisible war they were about to fight—a war that seemed to edge away from physicality and into the unnervingly psychological.
-
-As they descended, Dr. Jordan Hayes couldn't help but muse over the layers of data that could wait below, now almost longing for the comfort of empirical evidence. _To think that this reluctance to accept other possibilities may have been my biggest blind spot,_ Jordan contemplated, feeling the hard shell of skepticism begin to crack.
-
-Alex caught Jordan's reflective gaze and leaned in, his voice barely a murmur over the hum of the elevator. "Once we're down there, keep that analytical edge sharp. You see through the mazes of the unexplained better than anyone."
-
-The compliment was unexpected and weighed differently than praise from others. This was an acknowledgment from someone who stood on the front lines of the unknown with eyes wide open. "Thank you, Alex," Jordan said, the words carrying a trace of newfound assertiveness. "You can count on me."
-
-The exchange was cut short by a shudder that ran through the elevator, subtle, but enough to make them instinctively hold their breaths. It wasn't the mechanical stutter of old gears but a vibration that seemed to emanate from the very walls of the shaft—a whisper of something that defied natural explanation.
-
-Cruz was the first to react, all business despite the shadow that crossed their expression. "Systems check. Now," they barked out, masking the moment of disquiet with swift command.
-
-Every agent checked their gear, sending confirmation signals through their comms, creating a chorus of electronic beeps that promised readiness. But there was an unspoken question among them: was their technology, their weaponry, their protocols sufficient for what awaited them or merely a fragile comfort?
-
-Against the gravity of the silence that was once again closing in, Sam's voice crackled through, only half-jest. "I'd laugh if we run into Martians playing poker down there—just to lighten the mood, you know?"
-
-Despite—or perhaps because of—the oddity of the moment, this elicited a round of chuckles, an audible release of tension that ran counterpoint to the undercurrent of anxiety coursing through the team.
-
-As the elevator came to a halting, eerie calm at the sub-level, the group stepped off, finding themselves at the threshold of Dulce's mysterious halls. They stood in a tight pack, sharing a cautious glance before fanning out into the unknown, each one acutely aware that the truth was inevitably intertwined with danger.
-
-Into the depths of Dulce, the team advanced, their silence now a shared testament to the camaraderie born of facing the abyss together—and the steel resolve to uncover whatever horrors lay hidden in its shadows.
-
-\*
-
-The weight of the thick metal door closing behind them reverberated through the concrete hallway, marking the final threshold between the familiar world above and the strangeness that lay beneath. Dulce base, a name that had been whispered in the wind-blown deserts above and in the shadowed corners of conspiracy forums, now a tangible cold reality that they could touch — and that touched them back with a chill.
-
-Like lambs led to an altar of alien deities, so did Agents Alex Mercer, Jordan Hayes, Taylor Cruz, and Sam Rivera proceed, their movements measured, their senses heightened. The air was still, almost respectful of the gravity of their presence. Their torch beams sliced through the darkness, uncovering steel doors with warnings that spoke of top secrets and mortal dangers.
-
-Taylor Cruz, stepping firmly into the role of de facto leader, set a brisk pace. "Eyes sharp, people. Comms check, every thirty seconds," Taylor ordered, their voice echoing slightly before being swallowed by the surrounding silence.
-
-Sam, fiddling with a handheld device aimed at detecting electronic anomalies, offered a murmured "Copy that," their usual buoyancy dimmed by the oppressive atmosphere.
-
-It was Jordan Hayes who paused at an innocuous looking panel, nondescript amongst the gauntlet of secured doorways. "Mercer, Rivera, come see this," Jordan’s voice was marked with a rare hint of urgency.
-
-Alex joined Jordan's side, examining the panel which, at a mere glance, seemed just another part of the base's infrastructure. Yet, to the trained eye, it appeared out of place—a facade.
-
-Jordan explained their reasoning as Sam approached, instinctively understanding the significance of what lay beneath, "This panel is a recent addition — covering something they didn't want found."
-
-Before Alex could respond, the soft whir of an approaching drone cut through their muffled exchange. Taylor had looped back upon hearing the commotion. "Explanations later. We can't afford to attract..." Cruz’s voice trailed off as the small airborne device came into view, its sensors locked onto the group.
-
-Sam was the first to react, their tech-savvy mind already steps ahead. "I've got this," they declared, fingers flying over the controls of their own gadgetry to ward off the impending threat.
-
-The drone lingered, its scan seeming more curious than hostile. But within moments, courtesy of Sam's interference, the little sentinel drifted away, retreating into the shadows as if accepting a silent truce. The crew exhaled, a moment of collective relief palpable in the air.
-
-Cruz squared their shoulders, clearly ruffled but not conceding any ground. "Move out," they directed, a hint more forceful than before. "And Rivera, keep that trick handy."
-
-The team pressed onward, the quiet now filled with the soft beeps of regular comms checks, their pace undeterred by the confrontation. Yet, every agent held a renewed sense of wariness, their trust in one another deepening with the knowledge that the base—its technology, its secrets—was alive in a way they hadn't fully anticipated.
-
-As they converged upon a central hub, the imposing doors to the mainframe room stood ajar — an invitation or a trap, neither option comforting. Without a word, they fortified their resolve and stepped through the threshold, where the dim glow of operational LED lights and the distant hum of machinery hinted at Dulce’s still-beating heart.
-
-Solemnly, yet unmistakably together, they moved deeper into the heart of the enigma, ready to unmask the lifeforce of Dulce base or confront whatever existential threat lay in wait. It was in that unwavering march towards the unknown that their destinies were forever cemented to the legacy of Operation: Dulce.
-
-## Chapter 3
-
-The thrumming of monitors cast a stark contrast to the rigid silence enveloping the group. Agent Alex Mercer, unfailingly determined on paper, seemed dwarfed by the enormity of the sterile briefing room where Paranormal Military Squad's elite convened. With dulled eyes, he scanned the projectors outlining their impending odyssey into Operation: Dulce.
-
-\*
-
-The cooling vents hummed in a monotonous drone, but it was the crackle of the comms system coming to life that cut through the lab’s tension. Dr. Jordan Hayes hovered over a table arrayed with alien technology, their fingers delicately probing the enigmatic circuitry retrieved from the crash site. Agent Alex Mercer watched, admiration blooming in silent solidarity for Jordan's deft touch and unspoken drive.
-
-Jordan, always composed, only allowed the faintest furrow of concentration to mar their brow. "What we understand about physics..." they muttered, trailing off as they realigned a translucent component. The device emitted a low pulse, causing Jordan to still. "Could be fundamentally changed by this."
-
-A calculated risk—that's what this was. And for a person of science, a gamble was worth the potential paradigm shift.
-
-"I’ve been thinking," Alex started, his eyes still fixed on the immediately tangible mystery before them. "About what’s at stake here. Not the mission parameters, but what this means for us—humanity."
-
-Jordan glanced up, meeting his eyes just long enough to convey the shared enormity of their situation; the career-defining glory and existential dread entwined. "The quest for understanding always comes at a price. We're standing on the precipice of knowledge that could either elevate us or condemn us."
-
-The charged air between them spiked as Taylor Cruz’s brusque tones sliced through their reverie. "Hayes, Mercer, this isn't philosophy hour. Focus on the task. We need actionable intel, not daydreams."
-
-With a sound of restrained acknowledgment, Jordan returned their gaze to the device, while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.
-
-Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”
-
-The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.
-
-It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths and for different reasons. Yet, beneath the veneer of duty, the enticement of the vast unknown pulled them inexorably together, coalescing their distinct desires into a shared pulse of anticipation.
-
-Marshaled back to the moment by the blink of lights and whir of machinery, they refocused their efforts, each movement sharpened by the knowledge that beyond understanding the unearthly artifacts, they might be piecing together the future of their species.
-
-\*
-
-Amidst the sterility of the briefing room, the liminal space between the facts laid out and the hidden truths, sat Sam Rivera, his demeanor an artful balance of focus and a casual disguise of his razor-sharp talent with technology. Across from him, Alex Mercer lingered in thought, the mental cogs turning as each file on Dulce stirred more than curiosity—it beckoned to a past both honored and burdensome.
-
-"You've been quiet, Sam," Alex noted, catching the younger man's contemplative gaze. "Your take on these signal inconsistencies?"
-
-There was a respect in Alex's tone, though a respectful distance remained—a gulf of experience and a hint of protective mentorship that stood between them. Sam nodded, recognizing the space afforded to him, and he couldn't help but feel the weight of expectation pressing upon his shoulders. It wasn't just the mission that was immense, it was the trust being placed in him.
-
-"The patterns are... off," Sam admitted, hesitant but driven. "If I'm right, what we're looking at isn't random—it's a structured anomaly. We need to be ready for anything." 
-
-Alex's eyes brightened with a subtle approval that crossed the distance like a silent nod. "Good. Keen eyes will keep us ahead—or at least not blindsided," he said, affirming the belief that inscribed Sam's role as more than the tech personnel—he was to be a guiding intellect in the heart of uncertainty.
-
-Their exchange was cut short by Taylor Cruz's abrupt arrival, his gait brimming with a robust confidence that veiled the sharp undercurrents of his striving nature. "Time to gear up. Dulce waits for no one," Taylor announced, his voice carrying an iron resolve that knew the costs of hesitation—though whether the cost was calculated in human or career terms was an ambiguity he wore like a badge of honor.
-
-As Sam and Alex nodded in unison, the icy chasm of hierarchy and cryptic protocols seemed momentarily to bridge over with an understanding—this mission was convergence, a nexus point that would challenge each of their motives and strength.
-
-They filed out of the briefing room, their footsteps synchronized, a rhythm that spoke volumes of the unknown cadence they would soon march to within the base's veins. For Alex Mercer, the link with Sam Rivera, though distant, was now poised with a mutuality ready to be tested; for Taylor Cruz, the initiative pulsed like a heartbeat, anticipation thinly veiled behind a mask of duty.
-
-In the midst of the descent, they were each alone yet irrevocably joined, stepping closer towards the volatile embrace of Operation: Dulce.
--- a/tests/fixtures/min-csv/settings.yml
+++ b/tests/fixtures/min-csv/settings.yml
@ -1,28 +1,32 @@
 models:
  default_chat_model:
    azure_auth_type: api_key
-    type: ${GRAPHRAG_LLM_TYPE}
+    type: chat
+    model_provider: azure
    api_key: ${GRAPHRAG_API_KEY}
    api_base: ${GRAPHRAG_API_BASE}
-    api_version: ${GRAPHRAG_API_VERSION}
-    deployment_name: ${GRAPHRAG_LLM_DEPLOYMENT_NAME}
-    model: ${GRAPHRAG_LLM_MODEL}
-    tokens_per_minute: ${GRAPHRAG_LLM_TPM}
-    requests_per_minute: ${GRAPHRAG_LLM_RPM}
+    api_version: "2025-04-01-preview"
+    deployment_name: gpt-4.1
+    model: gpt-4.1
+    retry_strategy: exponential_backoff
+    tokens_per_minute: null
+    requests_per_minute: null
    model_supports_json: true
-    concurrent_requests: 50
+    concurrent_requests: 25
    async_mode: threaded
  default_embedding_model:
    azure_auth_type: api_key
-    type: ${GRAPHRAG_EMBEDDING_TYPE}
+    type: embedding
+    model_provider: azure
    api_key: ${GRAPHRAG_API_KEY}
    api_base: ${GRAPHRAG_API_BASE}
-    api_version: ${GRAPHRAG_API_VERSION}
-    deployment_name: ${GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME}
-    model: ${GRAPHRAG_EMBEDDING_MODEL}
+    api_version: "2025-04-01-preview"
+    deployment_name: text-embedding-ada-002
+    model: text-embedding-ada-002
+    retry_strategy: exponential_backoff
    tokens_per_minute: null
    requests_per_minute: null
-    concurrent_requests: 50
+    concurrent_requests: 25
    async_mode: threaded

 vector_store:
@ -36,9 +40,4 @@ input:
  file_type: csv

 snapshots:
-  embeddings: True
-
-drift_search:
-  n_depth: 1
-  drift_k_followups: 3
-  primer_folds: 3
+  embeddings: true
--- a/tests/fixtures/text/config.json
+++ b/tests/fixtures/text/config.json
@ -13,8 +13,8 @@
        },
        "finalize_graph": {
            "row_range": [
-                1,
-                500
+                10,
+                200
            ],
            "nan_allowed_columns": [
                "x",
@ -28,7 +28,7 @@
        },
        "extract_covariates": {
            "row_range": [
-                1,
+                10,
                100
            ],
            "nan_allowed_columns": [
@ -116,10 +116,6 @@
            "query": "What is the major conflict in this story and who are the protagonist and antagonist?",
            "method": "global"
        },
-        {
-            "query": "What is the main theme of the story?",
-            "method": "drift"
-        },
        {
            "query": "Who is Jordan Hayes?",
            "method": "basic"
--- a/tests/fixtures/text/settings.yml
+++ b/tests/fixtures/text/settings.yml
@ -1,28 +1,32 @@
 models:
  default_chat_model:
    azure_auth_type: api_key
-    type: ${GRAPHRAG_LLM_TYPE}
+    type: chat
+    model_provider: azure
    api_key: ${GRAPHRAG_API_KEY}
    api_base: ${GRAPHRAG_API_BASE}
-    api_version: ${GRAPHRAG_API_VERSION}
-    deployment_name: ${GRAPHRAG_LLM_DEPLOYMENT_NAME}
-    model: ${GRAPHRAG_LLM_MODEL}
-    tokens_per_minute: ${GRAPHRAG_LLM_TPM}
-    requests_per_minute: ${GRAPHRAG_LLM_RPM}
+    api_version: "2025-04-01-preview"
+    deployment_name: gpt-4.1
+    model: gpt-4.1
+    retry_strategy: exponential_backoff
+    tokens_per_minute: null
+    requests_per_minute: null
    model_supports_json: true
-    concurrent_requests: 50
+    concurrent_requests: 25
    async_mode: threaded
  default_embedding_model:
    azure_auth_type: api_key
-    type: ${GRAPHRAG_EMBEDDING_TYPE}
+    type: embedding
+    model_provider: azure
    api_key: ${GRAPHRAG_API_KEY}
    api_base: ${GRAPHRAG_API_BASE}
-    api_version: ${GRAPHRAG_API_VERSION}
-    deployment_name: ${GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME}
-    model: ${GRAPHRAG_EMBEDDING_MODEL}
+    api_version: "2025-04-01-preview"
+    deployment_name: text-embedding-ada-002
+    model: text-embedding-ada-002
+    retry_strategy: exponential_backoff
    tokens_per_minute: null
    requests_per_minute: null
-    concurrent_requests: 50
+    concurrent_requests: 25
    async_mode: threaded

 vector_store:
@ -41,9 +45,4 @@ community_reports:
  max_input_length: 8000

 snapshots:
-  embeddings: True
-
-drift_search:
-  n_depth: 1
-  drift_k_followups: 3
-  primer_folds: 3
+  embeddings: true
--- a/tests/integration/vector_stores/test_factory.py
+++ b/tests/integration/vector_stores/test_factory.py
@ -81,9 +81,7 @@ def test_register_and_create_custom_vector_store():
    )

    vector_store = VectorStoreFactory.create_vector_store(
-        vector_store_type="custom",
-        vector_store_schema_config=VectorStoreSchemaConfig(),
-        kwargs={},
+        vector_store_type="custom", vector_store_schema_config=VectorStoreSchemaConfig()
    )

    assert custom_vector_store_class.called
@ -109,7 +107,6 @@ def test_create_unknown_vector_store():
        VectorStoreFactory.create_vector_store(
            vector_store_type="unknown",
            vector_store_schema_config=VectorStoreSchemaConfig(),
-            kwargs={},
        )


@ -162,7 +159,6 @@ def test_register_class_directly_works():
    vector_store = VectorStoreFactory.create_vector_store(
        vector_store_type="custom_class",
        vector_store_schema_config=VectorStoreSchemaConfig(),
-        kwargs={"collection_name": "test"},
    )

    assert isinstance(vector_store, CustomVectorStore)
--- a/tests/smoke/test_fixtures.py
+++ b/tests/smoke/test_fixtures.py
@ -142,9 +142,7 @@ class TestIndexer:
        ]
        command = [arg for arg in command if arg]
        logger.info("running command ", " ".join(command))
-        completion = subprocess.run(
-            command, env={**os.environ, "GRAPHRAG_INPUT_FILE_TYPE": input_file_type}
-        )
+        completion = subprocess.run(command, env=os.environ)
        assert completion.returncode == 0, (
            f"Indexer failed with return code: {completion.returncode}"
        )
@ -224,18 +222,14 @@ class TestIndexer:
        os.environ,
        {
            **os.environ,
-            "BLOB_STORAGE_CONNECTION_STRING": os.getenv(
-                "GRAPHRAG_CACHE_CONNECTION_STRING", WELL_KNOWN_AZURITE_CONNECTION_STRING
-            ),
+            "BLOB_STORAGE_CONNECTION_STRING": WELL_KNOWN_AZURITE_CONNECTION_STRING,
            "LOCAL_BLOB_STORAGE_CONNECTION_STRING": WELL_KNOWN_AZURITE_CONNECTION_STRING,
-            "GRAPHRAG_CHUNK_SIZE": "1200",
-            "GRAPHRAG_CHUNK_OVERLAP": "0",
            "AZURE_AI_SEARCH_URL_ENDPOINT": os.getenv("AZURE_AI_SEARCH_URL_ENDPOINT"),
            "AZURE_AI_SEARCH_API_KEY": os.getenv("AZURE_AI_SEARCH_API_KEY"),
        },
        clear=True,
    )
-    @pytest.mark.timeout(800)
+    @pytest.mark.timeout(2000)
    def test_fixture(
        self,
        input_path: str,
--- a/tests/unit/config/fixtures/minimal_config/settings.yaml
+++ b/tests/unit/config/fixtures/minimal_config/settings.yaml
@ -1,9 +1,11 @@
 models:
  default_chat_model:
    api_key: ${CUSTOM_API_KEY}
-    type: openai_chat
+    type: chat
+    model_provider: openai
    model: gpt-4-turbo-preview
  default_embedding_model:
    api_key: ${CUSTOM_API_KEY}
-    type: openai_embedding
+    type: embedding
+    model_provider: openai
    model: text-embedding-3-small
--- a/tests/unit/config/fixtures/minimal_config_missing_env_var/settings.yaml
+++ b/tests/unit/config/fixtures/minimal_config_missing_env_var/settings.yaml
@ -1,9 +1,11 @@
 models:
  default_chat_model:
    api_key: ${SOME_NON_EXISTENT_ENV_VAR}
-    type: openai_chat
+    type: chat
+    model_provider: openai
    model: gpt-4-turbo-preview
  default_embedding_model:
    api_key: ${SOME_NON_EXISTENT_ENV_VAR}
-    type: openai_embedding
+    type: embedding
+    model_provider: openai
    model: text-embedding-3-small
--- a/tests/unit/config/test_config.py
+++ b/tests/unit/config/test_config.py
@ -133,19 +133,6 @@ def test_missing_azure_api_version() -> None:
        })


-def test_missing_azure_deployment_name() -> None:
-    missing_deployment_name_config = base_azure_model_config.copy()
-    del missing_deployment_name_config["deployment_name"]
-
-    with pytest.raises(ValidationError):
-        create_graphrag_config({
-            "models": {
-                defs.DEFAULT_CHAT_MODEL_ID: missing_deployment_name_config,
-                defs.DEFAULT_EMBEDDING_MODEL_ID: DEFAULT_EMBEDDING_MODEL_CONFIG,
-            }
-        })
-
-
 def test_default_config() -> None:
    expected = get_default_graphrag_config()
    actual = create_graphrag_config({"models": DEFAULT_MODEL_CONFIG})
--- a/tests/unit/config/utils.py
+++ b/tests/unit/config/utils.py
@ -41,12 +41,14 @@ DEFAULT_CHAT_MODEL_CONFIG = {
    "api_key": FAKE_API_KEY,
    "type": defs.DEFAULT_CHAT_MODEL_TYPE.value,
    "model": defs.DEFAULT_CHAT_MODEL,
+    "model_provider": defs.DEFAULT_MODEL_PROVIDER,
 }

 DEFAULT_EMBEDDING_MODEL_CONFIG = {
    "api_key": FAKE_API_KEY,
    "type": defs.DEFAULT_EMBEDDING_MODEL_TYPE.value,
    "model": defs.DEFAULT_EMBEDDING_MODEL,
+    "model_provider": defs.DEFAULT_MODEL_PROVIDER,
 }

 DEFAULT_MODEL_CONFIG = {
--- a/tests/unit/indexing/graph/extractors/community_reports/test_sort_context.py
+++ b/tests/unit/indexing/graph/extractors/community_reports/test_sort_context.py
@ -6,7 +6,7 @@ import platform
 from graphrag.index.operations.summarize_communities.graph_context.sort_context import (
    sort_context,
 )
-from graphrag.query.llm.text_utils import num_tokens
+from graphrag.tokenizer.get_tokenizer import get_tokenizer

 nan = math.nan

@ -204,16 +204,18 @@ context: list[dict] = [


 def test_sort_context():
-    ctx = sort_context(context)
+    tokenizer = get_tokenizer()
+    ctx = sort_context(context, tokenizer=tokenizer)
    assert ctx is not None, "Context is none"
-    num = num_tokens(ctx)
+    num = tokenizer.num_tokens(ctx)
    assert num == 828 if platform.system() == "Windows" else 826, (
        f"num_tokens is not matched for platform (win = 827, else 826): {num}"
    )


 def test_sort_context_max_tokens():
-    ctx = sort_context(context, max_context_tokens=800)
+    tokenizer = get_tokenizer()
+    ctx = sort_context(context, tokenizer=tokenizer, max_context_tokens=800)
    assert ctx is not None, "Context is none"
-    num = num_tokens(ctx)
+    num = tokenizer.num_tokens(ctx)
    assert num <= 800, f"num_tokens is not less than or equal to 800: {num}"
--- a/tests/unit/litellm_services/test_retries.py
+++ b/tests/unit/litellm_services/test_retries.py
@ -15,7 +15,7 @@ retry_factory = RetryFactory()


@pytest.mark.parametrize(
-    ("strategy", "max_attempts", "max_retry_wait", "expected_time"),
+    ("strategy", "max_retries", "max_retry_wait", "expected_time"),
    [
        (
            "native",
@ -44,7 +44,7 @@ retry_factory = RetryFactory()
    ],
 )
 def test_retries(
-    strategy: str, max_attempts: int, max_retry_wait: int, expected_time: float
+    strategy: str, max_retries: int, max_retry_wait: int, expected_time: float
 ) -> None:
    """
    Test various retry strategies with various configurations.
@ -52,12 +52,12 @@ def test_retries(
    Args
    ----
        strategy: The retry strategy to use.
-        max_attempts: The maximum number of retry attempts.
+        max_retries: The maximum number of retry attempts.
        max_retry_wait: The maximum wait time between retries.
    """
    retry_service = retry_factory.create(
        strategy=strategy,
-        max_attempts=max_attempts,
+        max_retries=max_retries,
        max_retry_wait=max_retry_wait,
    )

@ -75,16 +75,14 @@ def test_retries(
    elapsed_time = time.time() - start_time

    # subtract 1 from retries because the first call is not a retry
-    assert retries - 1 == max_attempts, (
-        f"Expected {max_attempts} retries, got {retries}"
-    )
+    assert retries - 1 == max_retries, f"Expected {max_retries} retries, got {retries}"
    assert elapsed_time >= expected_time, (
        f"Expected elapsed time >= {expected_time}, got {elapsed_time}"
    )


@pytest.mark.parametrize(
-    ("strategy", "max_attempts", "max_retry_wait", "expected_time"),
+    ("strategy", "max_retries", "max_retry_wait", "expected_time"),
    [
        (
            "native",
@ -113,7 +111,7 @@ def test_retries(
    ],
 )
 async def test_retries_async(
-    strategy: str, max_attempts: int, max_retry_wait: int, expected_time: float
+    strategy: str, max_retries: int, max_retry_wait: int, expected_time: float
 ) -> None:
    """
    Test various retry strategies with various configurations.
@ -121,12 +119,12 @@ async def test_retries_async(
    Args
    ----
        strategy: The retry strategy to use.
-        max_attempts: The maximum number of retry attempts.
+        max_retries: The maximum number of retry attempts.
        max_retry_wait: The maximum wait time between retries.
    """
    retry_service = retry_factory.create(
        strategy=strategy,
-        max_attempts=max_attempts,
+        max_retries=max_retries,
        max_retry_wait=max_retry_wait,
    )

@ -144,9 +142,7 @@ async def test_retries_async(
    elapsed_time = time.time() - start_time

    # subtract 1 from retries because the first call is not a retry
-    assert retries - 1 == max_attempts, (
-        f"Expected {max_attempts} retries, got {retries}"
-    )
+    assert retries - 1 == max_retries, f"Expected {max_retries} retries, got {retries}"
    assert elapsed_time >= expected_time, (
        f"Expected elapsed time >= {expected_time}, got {elapsed_time}"
    )
--- a/tests/verbs/util.py
+++ b/tests/verbs/util.py
@ -17,12 +17,14 @@ DEFAULT_CHAT_MODEL_CONFIG = {
    "api_key": FAKE_API_KEY,
    "type": defs.DEFAULT_CHAT_MODEL_TYPE.value,
    "model": defs.DEFAULT_CHAT_MODEL,
+    "model_provider": defs.DEFAULT_MODEL_PROVIDER,
 }

 DEFAULT_EMBEDDING_MODEL_CONFIG = {
    "api_key": FAKE_API_KEY,
    "type": defs.DEFAULT_EMBEDDING_MODEL_TYPE.value,
    "model": defs.DEFAULT_EMBEDDING_MODEL,
+    "model_provider": defs.DEFAULT_MODEL_PROVIDER,
 }

 DEFAULT_MODEL_CONFIG = {
--- a/unified-search-app/README.md
+++ b/unified-search-app/README.md
@ -91,7 +91,7 @@ You can host Unified Search datasets locally or in a blob.

 # Run the app

-Install all the dependencies: `uv sync --extra dev`
+Install all the dependencies: `uv sync`

 Run the project using streamlit: `uv run poe start`

--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
Alonso Guevara	fdb7e3835b	Release v2.7.0 (#2087 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details	2025-10-08 21:33:34 -07:00
Nathan Evans	ac8a7f5eef	Housekeeping (#2086 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Add deprecation warnings for fnllm and multi-search * Fix dangling token_encoder refs * Fix local_search notebook * Fix global search dynamic notebook * Fix global search notebook * Fix drift notebook * Switch example notebooks to use LiteLLM config * Properly annotate dev deps as a group * Semver * Remove --extra dev * Remove llm_model variable * Ignore ruff ASYNC240 * Add note about expected broken notebook in docs * Fix custom vector store notebook * Push tokenizer throughout	2025-10-07 16:21:24 -07:00
Nathan Evans	6c86b0a7bb	Init config cleanup (#2084 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Spruce up init_config output, including LiteLLM default * Remove deployment_name requirement for Azure * Semver * Add model_provider * Add default model_provider * Remove OBE test * Update minimal config for tests * Add model_provider to verb tests	2025-10-06 12:06:41 -07:00
Nathan Evans	2bd3922d8d	Litellm auth fix (#2083 ) * Fix scope for Azure auth with LiteLLM * Change internal language on max_attempts to max_retries * Rework model config connectivity validation * Semver * Swtich smoke tests to LiteLLM * Take out temporary retry_strategy = none since it is not fnllm compatible * Bump smoke test timeout * Bump smoke timeout further * Tune smoke params * Update smoke test bounds * Remove covariates from min-csv smoke * Smoke: adjust communities, remove drift * Remove secrets where they aren't necessary * Clean out old env var references	2025-10-06 10:54:21 -07:00
Nathan Evans	7f996cf584	Docs/2.6.0 (#2070 ) Some checks failed gh-pages / build (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (ubuntu-latest, 3.11) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python CI / python-ci (windows-latest, 3.11) (push) Has been cancelled Details Python Integration Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Integration Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Notebook Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Python Publish (pypi) / Upload release to PyPI (push) Has been cancelled Details Python Smoke Tests / python-ci (ubuntu-latest, 3.10) (push) Has been cancelled Details Python Smoke Tests / python-ci (windows-latest, 3.10) (push) Has been cancelled Details Spellcheck / spellcheck (push) Has been cancelled Details * Add basic search to overview * Add info on input documents DataFrame * Add info on factories to docs * Add consumption warning and switch to "christmas" for folder name * Add logger to factories list * Add litellm docs. (#2058) * Fix version for input docs * Spelling --------- Co-authored-by: Derek Worthen <worthend.derek@gmail.com>	2025-09-23 14:48:28 -07:00