diff --git a/.vscode/settings.json b/.vscode/settings.json
new file mode 100644
index 0000000..4af4e3f
--- /dev/null
+++ b/.vscode/settings.json
@@ -0,0 +1,6 @@
+{
+    "workbench.colorTheme": "Solarized Dark+",
+    "editor.suggestSelection": "first",
+    "vsintellicode.modify.editor.suggestSelection": "automaticallyOverrodeDefaultValue",
+    "commentTranslate.source": "Google"
+}
diff --git a/ch05/01_main-chapter-code/README.md b/ch05/01_main-chapter-code/README.md
new file mode 100644
index 0000000..5e95d9a
--- /dev/null
+++ b/ch05/01_main-chapter-code/README.md
@@ -0,0 +1,7 @@
+# Chapter 5: 使用未标记数据进行预训练
+
+- [ch05.ipynb](ch05.ipynb) 本章所有代码
+- [previous_chapters.py](previous_chapters.py) 在前面章节中的包含 `MultiHeadAttention`的python代码模块，在这里的未标记数据模型预训练过程中我们会用到它
+- [train.py](train.py) 一个独立的python脚本文件，包含我们在[ch05.ipynb](ch05.ipynb)中实现的GPT模型训练部分代码
+- [generate.py](generate.py) 一个独立的python脚本文件，包含我们在[ch05.ipynb](ch05.ipynb)中实现的GPT模型权重加载和应用代码
+
diff --git a/ch05/01_main-chapter-code/ch05.ipynb b/ch05/01_main-chapter-code/ch05.ipynb
new file mode 100644
index 0000000..d243b4f
--- /dev/null
+++ b/ch05/01_main-chapter-code/ch05.ipynb
@@ -0,0 +1,2381 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "45398736-7e89-4263-89c8-92153baff553",
+   "metadata": {},
+   "source": [
+    "<font size=\"1\">\n",
+    "Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
+    "Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
+    "</font>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66dd524e-864c-4012-b0a2-ccfc56e80024",
+   "metadata": {
+    "id": "66dd524e-864c-4012-b0a2-ccfc56e80024"
+   },
+   "source": [
+    "# Chapter 5: 在未标记数据上进行预训练"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "92b989e9-da36-4159-b212-799184764dd9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "matplotlib version: 3.5.2\n",
+      "numpy version: 1.24.4\n",
+      "tiktoken version: 0.6.0\n",
+      "torch version: 2.1.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "from importlib.metadata import version\n",
+    "\n",
+    "pkgs = [\"matplotlib\", \"numpy\", \"tiktoken\", \"torch\"]\n",
+    "for p in pkgs:\n",
+    "    print(f\"{p} version: {version(p)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a3bdf9e-2ff0-4a57-abab-ede2d955a237",
+   "metadata": {},
+   "source": [
+    "- 在本章中，我们将实现训练循环及基本模型评估代码，以预训练一个LLM\n",
+    "- 本章结尾处，我们还将加载OpenAI提供的公开可用的预训练权重并将其导入到我们的模型中"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd27fcc-2886-47cb-b544-046c2c31f02a",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/img-1.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d214765-7a73-42d5-95e9-302154b29db9",
+   "metadata": {},
+   "source": [
+    "- 本章所涵盖的主题如下图所示"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f67711d4-8391-4fee-aeef-07ea53dd5841",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/img-2.webp\" width=400px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d824183-145c-4865-89e1-1f0d0a338f19",
+   "metadata": {
+    "id": "0d824183-145c-4865-89e1-1f0d0a338f19"
+   },
+   "source": [
+    "## 5.1 评估文本生成模型"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3350f8c-5181-4f9b-a789-4523105e98f2",
+   "metadata": {},
+   "source": [
+    "- 我们首先简要回顾一下使用上一章中的代码初始化 GPT 模型\n",
+    "- 然后，我们讨论 LLM 的基本评估指标\n",
+    "- 最后，在本节中，我们将这些评估指标应用于训练和验证数据集"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc1cf3f-82d8-46c7-9ecc-58979ce87cdd",
+   "metadata": {
+    "id": "bdc1cf3f-82d8-46c7-9ecc-58979ce87cdd"
+   },
+   "source": [
+    "### 5.1.1 使用 GPT 生成文本"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b3415fd-9f4a-4548-908e-9dfa56edc9bc",
+   "metadata": {},
+   "source": [
+    "- 我们使用上一章中的代码初始化 GPT 模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "86000d74-624a-48f0-86da-f41926cb9e04",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "86000d74-624a-48f0-86da-f41926cb9e04",
+    "outputId": "ad482cfd-5a62-4f0d-e1e0-008d6457f512"
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from previous_chapters import GPTModel\n",
+    "\n",
+    "GPT_CONFIG_124M = {\n",
+    "    \"vocab_size\": 50257,  # Vocabulary size\n",
+    "    \"ctx_len\": 256,       # Shortened context length (orig: 1024)\n",
+    "    \"emb_dim\": 768,       # Embedding dimension\n",
+    "    \"n_heads\": 12,        # Number of attention heads\n",
+    "    \"n_layers\": 12,       # Number of layers\n",
+    "    \"drop_rate\": 0.1,     # Dropout rate\n",
+    "    \"qkv_bias\": False     # Query-key-value bias\n",
+    "}\n",
+    "\n",
+    "torch.manual_seed(123)\n",
+    "model = GPTModel(GPT_CONFIG_124M)\n",
+    "model.eval();  # Disable dropout during inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09c6cf0f-7458-48a2-97fd-aa5068d65e8c",
+   "metadata": {},
+   "source": [
+    "- 我们在上面使用0.1的dropout，但现在的llm训练中通常没有dropout\n",
+    "- 现在的llm也不会在查询，键和值矩阵的`nn.Linear`层中使用偏差向量 (与早期的GPT模型不同)，这是通过设置`“qkv_bias”: False`来实现的\n",
+    "- 我们只用256个token的上下文长度 (`ctx_len`)，以减少训练模型的计算资源需求，而原始的1.24亿参数GPT-2模型使用1024个token\n",
+    "  - 这是为了让更多的读者能够在他们的笔记本电脑上执行代码示例\n",
+    "  - 但是，请随意增加`ctx_len`到1024token (这不需要任何代码更改)\n",
+    "  - 之后我们还将从预训练的权重加载具有1024`ctx_len`的模型"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59f80895-be35-4bb5-81cb-f357ef7367fe",
+   "metadata": {},
+   "source": [
+    "- 接下来，我们使用上一章中的`generate_text_simple`函数来生成文本。\n",
+    "- 此外，我们定义了两个便利函数`text_to_token_ids`和`token_ids_to_text`，用于在本章中进行标记和文本表示之间的转换。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "741881f3-cee0-49ad-b11d-b9df3b3ac234",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/img-3.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5e062b82-3540-48ce-8eb4-009686d0d16c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Output text:\n",
+      " Every effort moves you rentingetic wasnم refres RexMeCHicular stren\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tiktoken\n",
+    "from previous_chapters import generate_text_simple\n",
+    "\n",
+    "def text_to_token_ids(text, tokenizer):\n",
+    "    encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'})\n",
+    "    encoded_tensor = torch.tensor(encoded).unsqueeze(0) # add batch dimension\n",
+    "    return encoded_tensor\n",
+    "\n",
+    "def token_ids_to_text(token_ids, tokenizer):\n",
+    "    flat = token_ids.squeeze(0) # remove batch dimension\n",
+    "    return tokenizer.decode(flat.tolist())\n",
+    "\n",
+    "start_context = \"Every effort moves you\"\n",
+    "tokenizer = tiktoken.get_encoding(\"gpt2\")\n",
+    "\n",
+    "token_ids = generate_text_simple(\n",
+    "    model=model,\n",
+    "    idx=text_to_token_ids(start_context, tokenizer),\n",
+    "    max_new_tokens=10,\n",
+    "    context_size=GPT_CONFIG_124M[\"ctx_len\"]\n",
+    ")\n",
+    "\n",
+    "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4d3249b-b2a0-44c4-b589-ae4b403b8305",
+   "metadata": {},
+   "source": [
+    "- 如上所述，模型未能生成好的文本，因为它尚未经过训练。\n",
+    "- 我们如何以数值形式衡量或捕捉“好的文本”，以便在训练过程中进行跟踪？\n",
+    "- 下一小节将介绍用于计算生成输出的损失指标的度量标准，我们可以使用这些度量标准来衡量训练进度。\n",
+    "- 在后续关于微调大型语言模型（LLMs）的章节中，也将介绍其他衡量模型质量的方法。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f3d7ea2-637f-4490-bc76-e361fc81ae98",
+   "metadata": {
+    "id": "0f3d7ea2-637f-4490-bc76-e361fc81ae98"
+   },
+   "source": [
+    "### 5.1.2 计算文本生成损失：交叉熵和困惑度"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e1ba8aa-fb03-4d25-957f-fe8778762440",
+   "metadata": {},
+   "source": [
+    "- 假设我们有一个`inputs`张量，包含了2个训练样本（行）的标记ID。\n",
+    "- 对应于`inputs`，`targets`包含了我们希望模型生成的期望标记ID。\n",
+    "- 请注意，`targets`是`inputs`向右移动了一个位置，正如第2章中实现数据加载器时所解释的那样。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6b5402f8-ec0c-4a44-9892-18a97779ee4f",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "6b5402f8-ec0c-4a44-9892-18a97779ee4f",
+    "outputId": "8d6fa0ff-7b37-4634-c3f0-2c050cbe81f0"
+   },
+   "outputs": [],
+   "source": [
+    "inputs = torch.tensor([[16833, 3626, 6100],   # [\"every effort moves\",\n",
+    "                       [40,    1107, 588]])   #  \"I really like\"]\n",
+    "\n",
+    "targets = torch.tensor([[3626, 6100, 345  ],  # [\" effort moves you\",\n",
+    "                        [588,  428,  11311]]) #  \" really like chocolate\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33dc0645-ac2c-4973-9b40-6da40515bede",
+   "metadata": {},
+   "source": [
+    "- 将`inputs`输入模型后，我们获得了包含3个标记的2个输入样本的logits向量。\n",
+    "- 每个标记都是一个50,257维的向量，对应于词汇表的大小。\n",
+    "- 应用softmax函数，我们可以将logits张量转换为一个相同维度的张量，其中包含概率分数。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e7b6ec51-6f8c-49bd-a349-95ba38b46fb6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([2, 3, 50257])\n"
+     ]
+    }
+   ],
+   "source": [
+    "with torch.no_grad():\n",
+    "    logits = model(inputs)\n",
+    "\n",
+    "probas = torch.softmax(logits, dim=-1) # Probability of each token in vocabulary\n",
+    "print(probas.shape) # Shape: (batch_size, num_tokens, vocab_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c36a382-b5e2-4de6-9e65-0b69b685013b",
+   "metadata": {},
+   "source": [
+    "- 下图为了说明目的使用了一个非常小的词汇表，概述了我们如何将概率分数转换回文本，这一点我们在上一章的末尾进行了讨论。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "384d86a9-0013-476c-bb6b-274fd5f20b29",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/proba-to-text.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8480efd-d419-4954-9ecc-2876055334bd",
+   "metadata": {},
+   "source": [
+    "- 正如在前一章中讨论的，我们可以应用`argmax`函数将概率分数转换为预测的标记ID。\n",
+    "- 上文提到的softmax函数为每个标记生成了一个50,257维的向量；`argmax`函数返回这个向量中最高概率分数的位置，这就是给定标记的预测标记ID。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3b84c9f-dd08-482e-b903-a86fe44e1144",
+   "metadata": {},
+   "source": [
+    "- 由于我们有2个输入批次，每个批次包含3个标记，因此我们获得了2个3维的预测标记ID："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "34ebd76a-16ec-4c17-8958-8a135735cc1c",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "34ebd76a-16ec-4c17-8958-8a135735cc1c",
+    "outputId": "ed17da47-c3e7-4775-fd00-4ec5bcda3db2"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Token IDs:\n",
+      " tensor([[[16657],\n",
+      "         [  339],\n",
+      "         [42826]],\n",
+      "\n",
+      "        [[49906],\n",
+      "         [29669],\n",
+      "         [41751]]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "token_ids = torch.argmax(probas, dim=-1, keepdim=True)\n",
+    "print(\"Token IDs:\\n\", token_ids)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cee4072c-21ed-4df7-8721-dd2535362573",
+   "metadata": {},
+   "source": [
+    "- 如果我们解码这些标记，我们会发现它们与我们希望模型预测的标记，即目标标记，相当不同："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c990ead6-53cd-49a7-a6d1-14d8c1518249",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Targets batch 1:  effort moves you\n",
+      "Outputs batch 1:  Armed heNetflix\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(f\"Targets batch 1: {token_ids_to_text(targets[0], tokenizer)}\")\n",
+    "print(f\"Outputs batch 1: {token_ids_to_text(token_ids[0].flatten(), tokenizer)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a53eb8a7-070e-46d6-930c-314ba55a6ff2",
+   "metadata": {},
+   "source": [
+    "- 那是因为模型还没有被训练。\n",
+    "- 为了训练模型，我们需要知道它离正确预测（目标）有多远。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad90592f-0d5d-4ec8-9ff5-e7675beab10e",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/proba-index.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7251bf5-a079-4782-901d-68c9225d3157",
+   "metadata": {},
+   "source": [
+    "- 对应于目标索引的标记概率如下："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "54aef09c-d6e3-4238-8653-b3a1b0a1077a",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "54aef09c-d6e3-4238-8653-b3a1b0a1077a",
+    "outputId": "41c946a2-c458-433e-a53d-5e7e89d9dddc"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Batch 1: tensor([7.4541e-05, 3.1061e-05, 1.1563e-05])\n",
+      "Batch 2: tensor([3.9836e-05, 1.6783e-05, 4.7559e-06])\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_idx = 0\n",
+    "target_probas_1 = probas[batch_idx, [0, 1, 2], targets[batch_idx]]\n",
+    "print(\"Batch 1:\", target_probas_1)\n",
+    "\n",
+    "batch_idx = 1\n",
+    "target_probas_2 = probas[1, [0, 1, 2], targets[1]]\n",
+    "print(\"Batch 2:\", target_probas_2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0e89a19-73c2-4e49-93b4-861f699f1cbf",
+   "metadata": {},
+   "source": [
+    "- 我们希望最大化所有这些值，使它们接近1的概率。\n",
+    "- 在数学优化中，最大化概率分数的对数比分数值本身更容易；这超出了本书的范围，但我在这里录制了一个更详细的讲座：[L8.2 逻辑回归损失函数](https://www.youtube.com/watch?v=GxJe0DZvydM)。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "31402a67-a16e-4aeb-977e-70abb9c9949b",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "31402a67-a16e-4aeb-977e-70abb9c9949b",
+    "outputId": "1bf18e79-1246-4eab-efd8-12b328c78678"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([ -9.5042, -10.3796, -11.3677, -10.1308, -10.9951, -12.2561])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Compute logarithm of all token probabilities\n",
+    "log_probas = torch.log(torch.cat((target_probas_1, target_probas_2)))\n",
+    "print(log_probas)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4261441-a511-4633-9c4c-67998af31b84",
+   "metadata": {},
+   "source": [
+    "- 接下来，我们计算平均对数概率："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "9b003797-161b-4d98-81dc-e68320e09fec",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "9b003797-161b-4d98-81dc-e68320e09fec",
+    "outputId": "a447fe9c-7e27-40ed-f1fb-51210e3f7cc9"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(-10.7722)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Calculate the average probability for each token\n",
+    "avg_log_probas = torch.mean(log_probas)\n",
+    "print(avg_log_probas)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36d51994-ad17-4ba3-a6ec-f588b4b13585",
+   "metadata": {},
+   "source": [
+    "- 目标是通过优化模型权重，使得这个平均对数概率尽可能大。\n",
+    "- 由于对数函数的特性，最大可能的值是0，而我们目前远离0。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3de388a1-8a0a-4c94-8894-9041dc6ad514",
+   "metadata": {},
+   "source": [
+    "- 在深度学习中，我们通常不是最大化平均对数概率，而是遵循标准惯例来最小化平均对数概率的*负值*；在我们的例子中，不是最大化-10.7722使其接近0，在深度学习中，我们会最小化10.7722使其接近0。\n",
+    "- 负-10.7722的值，即10.7722，在深度学习中也被称为交叉熵损失。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "176ddf35-1c5f-4d7c-bf17-70f3e7069bd4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(10.7722)\n"
+     ]
+    }
+   ],
+   "source": [
+    "neg_avg_log_probas = avg_log_probas * -1\n",
+    "print(neg_avg_log_probas)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84eeb868-abd8-4028-82db-107546bf7c2c",
+   "metadata": {},
+   "source": [
+    "- PyTorch 已经实现了一个 `cross_entropy` 函数，该函数执行了前面的步骤。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bd24b7f-b760-47ad-bc84-86d13794aa54",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/cross-entropy.webp\" width=400px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8aaf9dd-3ee6-42bf-a63f-6e93dbfb989d",
+   "metadata": {},
+   "source": [
+    "- 在我们应用交叉熵函数之前，让我们先检查一下logits和targets的形状。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "695d6f64-5084-4c23-aea4-105c9e38cfe4",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "695d6f64-5084-4c23-aea4-105c9e38cfe4",
+    "outputId": "43fd802a-8136-4b35-df0d-f61a5d4cb561"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Logits shape: torch.Size([2, 3, 50257])\n",
+      "Targets shape: torch.Size([2, 3])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Logits have shape (batch_size, num_tokens, vocab_size)\n",
+    "print(\"Logits shape:\", logits.shape)\n",
+    "\n",
+    "# Targets have shape (batch_size, num_tokens)\n",
+    "print(\"Targets shape:\", targets.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d3d65f0-6566-4865-93e4-0c0bcb10cd06",
+   "metadata": {},
+   "source": [
+    "- 对于PyTorch中的`entropy_loss`函数，我们希望通过在批次维度上合并它们来展平这些张量："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "0e17e027-ab9f-4fb5-ac9b-a009b831c122",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "0e17e027-ab9f-4fb5-ac9b-a009b831c122",
+    "outputId": "0b2b778b-02fb-43b2-c879-adc59055a7d8"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Flattened logits: torch.Size([6, 50257])\n",
+      "Flattened targets: torch.Size([6])\n"
+     ]
+    }
+   ],
+   "source": [
+    "logits_flat = logits.flatten(0, 1)\n",
+    "targets_flat = targets.flatten()\n",
+    "\n",
+    "print(\"Flattened logits:\", logits_flat.shape)\n",
+    "print(\"Flattened targets:\", targets_flat.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4921a57f-3a79-473e-a863-6d63b495010f",
+   "metadata": {},
+   "source": [
+    "- 请注意，目标（targets）是标记ID，它们也代表了我们希望在logits张量中最大化的索引位置。\n",
+    "- PyTorch中的`cross_entropy`函数会自动处理应用softmax和对这些要最大化的logits中的标记索引内部计算对数概率。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "62d0816e-b29a-4c8f-a9a5-a167562de978",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "62d0816e-b29a-4c8f-a9a5-a167562de978",
+    "outputId": "c0be634a-2c65-4ff7-a73f-1bfc2e406ba4"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(10.7722)\n"
+     ]
+    }
+   ],
+   "source": [
+    "loss = torch.nn.functional.cross_entropy(logits_flat, targets_flat)\n",
+    "print(loss)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f15ce17-fd7b-4d8e-99da-b237523a7a80",
+   "metadata": {},
+   "source": [
+    "- 与交叉熵损失相关的概念是大型语言模型（LLM）的困惑度。\n",
+    "- 困惑度简单地说就是交叉熵损失的指数。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "168952a1-b964-4aa7-8e49-966fa26add54",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "168952a1-b964-4aa7-8e49-966fa26add54",
+    "outputId": "a0a692c1-6412-4068-8aa5-8858548141eb"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(47678.8633)\n"
+     ]
+    }
+   ],
+   "source": [
+    "perplexity = torch.exp(loss)\n",
+    "print(perplexity)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ae26dd-d77e-41fd-b924-6bd103dd4ee7",
+   "metadata": {},
+   "source": [
+    "- 困惑度通常被认为更具可解释性，因为它可以被理解为模型在每一步对有效词汇量的不确定性（在上面的例子中，这将是47,678个单词或标记）。\n",
+    "- 换句话说，困惑度提供了一种衡量模型预测的概率分布与数据集中单词实际分布匹配程度的方法。\n",
+    "- 与损失类似，较低的困惑度表明模型预测更接近实际分布。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ec6c217-e429-40c7-ad71-5d0a9da8e487",
+   "metadata": {
+    "id": "2ec6c217-e429-40c7-ad71-5d0a9da8e487"
+   },
+   "source": [
+    "### 5.1.3 计算训练集和验证集损失"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "530da89e-2448-436c-8f1b-28e8a31ef85c",
+   "metadata": {},
+   "source": [
+    "- 我们使用一个相对较小的数据集来训练大型语言模型（LLM）（实际上，只有一个短篇故事）。\n",
+    "  - 原因包括：\n",
+    "    - 你可以在没有合适GPU的笔记本电脑上在几分钟内运行代码示例。\n",
+    "    - 训练完成得相对较快（几分钟而不是几周），这对教育目的来说很好。\n",
+    "    - 我们使用的是公有领域的文本，可以包含在这个GitHub仓库中而不会违反任何使用权或增加仓库大小。\n",
+    "\n",
+    "- 例如，Llama 2 7B在A100 GPU上需要184,320小时的训练时间才能在2万亿个标记上完成训练。\n",
+    "  - 在撰写本文时，AWS上8xA100云服务器的每小时成本大约为30美元。\n",
+    "  - 因此，通过一个粗略的计算，训练这个LLM的成本将是 184,320 / 8 * 30美元 = 69万美元。\n",
+    "\n",
+    "- 下面，我们将使用第2章中使用过的相同数据集。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "654fde37-b2a9-4a20-a8d3-0206c056e2ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import urllib.request\n",
+    "\n",
+    "file_path = \"the-verdict.txt\"\n",
+    "url = \"https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt\"\n",
+    "\n",
+    "if not os.path.exists(file_path):\n",
+    "    with urllib.request.urlopen(url) as response:\n",
+    "        text_data = response.read().decode('utf-8')\n",
+    "    with open(file_path, \"w\", encoding=\"utf-8\") as file:\n",
+    "        file.write(text_data)\n",
+    "else:\n",
+    "    with open(file_path, \"r\", encoding=\"utf-8\") as file:\n",
+    "        text_data = file.read()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "379330f1-80f4-4e34-8724-41d892b04cee",
+   "metadata": {},
+   "source": [
+    "- 通过打印前100个和后100个单词来快速检查文本是否正确加载。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "6kgJbe4ehI4q",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 35
+    },
+    "id": "6kgJbe4ehI4q",
+    "outputId": "9ff31e88-ee37-47e9-ee64-da6eb552f46f"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "I HAD always thought Jack Gisburn rather a cheap genius--though a good fellow enough--so it was no \n"
+     ]
+    }
+   ],
+   "source": [
+    "# First 100 characters\n",
+    "print(text_data[:99])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "j2XPde_ThM_e",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 35
+    },
+    "id": "j2XPde_ThM_e",
+    "outputId": "a900c1b9-9a87-4078-968b-a5721deda5cb"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "it for me! The Strouds stand alone, and happen once--but there's no exterminating our kind of art.\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Last 100 characters\n",
+    "print(text_data[-99:])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "6b46a952-d50a-4837-af09-4095698f7fd1",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "6b46a952-d50a-4837-af09-4095698f7fd1",
+    "outputId": "c2a25334-21ca-486e-8226-0296e5fc6486"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Characters: 20479\n",
+      "Tokens: 5145\n"
+     ]
+    }
+   ],
+   "source": [
+    "total_char = len(text_data)\n",
+    "total_tokens = len(tokenizer.encode(text_data))\n",
+    "\n",
+    "print(\"Characters:\", total_char)\n",
+    "print(\"Tokens:\", total_tokens)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8830cb9-90f6-4e7c-8620-beeabc2d39f7",
+   "metadata": {},
+   "source": [
+    "- 虽然只有5,145个标记，对于训练一个大型语言模型（LLM）来说，这段文本非常短，但再次强调，这是出于教育目的（我们稍后还会加载预训练的权重）。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bedcad87-a0e8-4b9d-ac43-4e927ccbb50f",
+   "metadata": {},
+   "source": [
+    "- 接下来，我们将数据集划分为训练集和验证集，并使用第2章中的数据加载器为大型语言模型（LLM）训练准备批次数据。\n",
+    "- 为了可视化目的，下面的图表假设`max_length=6`，但对于训练加载器，我们将`max_length`设置为LLM支持的上下文长度。\n",
+    "- 下面的图表仅显示输入标记以简化表示。\n",
+    "  - 由于我们训练LLM来预测文本中的下一个单词，目标标记看起来与这些输入标记相同，只是目标标记向右移动了一个位置。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46bdaa07-ba96-4ac1-9d71-b3cc153910d9",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/batching.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "0959c855-f860-4358-8b98-bc654f047578",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from previous_chapters import create_dataloader_v1\n",
+    "\n",
+    "# Train/validation ratio\n",
+    "train_ratio = 0.90\n",
+    "split_idx = int(train_ratio * len(text_data))\n",
+    "train_data = text_data[:split_idx]\n",
+    "val_data = text_data[split_idx:]\n",
+    "\n",
+    "\n",
+    "torch.manual_seed(123)\n",
+    "\n",
+    "train_loader = create_dataloader_v1(\n",
+    "    train_data,\n",
+    "    batch_size=2,\n",
+    "    max_length=GPT_CONFIG_124M[\"ctx_len\"],\n",
+    "    stride=GPT_CONFIG_124M[\"ctx_len\"],\n",
+    "    drop_last=True,\n",
+    "    shuffle=True\n",
+    ")\n",
+    "\n",
+    "val_loader = create_dataloader_v1(\n",
+    "    val_data,\n",
+    "    batch_size=2,\n",
+    "    max_length=GPT_CONFIG_124M[\"ctx_len\"],\n",
+    "    stride=GPT_CONFIG_124M[\"ctx_len\"],\n",
+    "    drop_last=False,\n",
+    "    shuffle=False\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "f37b3eb0-854e-4895-9898-fa7d1e67566e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Sanity check\n",
+    "\n",
+    "if total_tokens * (train_ratio) < GPT_CONFIG_124M[\"ctx_len\"]:\n",
+    "    print(\"Not enough tokens for the training loader. \"\n",
+    "          \"Try to lower the `GPT_CONFIG_124M['ctx_len']` or \"\n",
+    "          \"increase the `training_ratio`\")\n",
+    "\n",
+    "if total_tokens * (1-train_ratio) < GPT_CONFIG_124M[\"ctx_len\"]:\n",
+    "    print(\"Not enough tokens for the validation loader. \"\n",
+    "          \"Try to lower the `GPT_CONFIG_124M['ctx_len']` or \"\n",
+    "          \"decrease the `training_ratio`\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7ac3296-a4d1-4303-9ac5-376518960c33",
+   "metadata": {},
+   "source": [
+    "- 我们使用相对较小的批次大小来减少计算资源的需求，并且因为数据集本身起初就非常小。\n",
+    "- 例如，Llama 2 7B就是使用1024的批次大小进行训练的。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8e0514d-b990-4dc0-9afb-7721993284a0",
+   "metadata": {},
+   "source": [
+    "- 一个可选的检查，以确认数据是否已正确加载："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "ca0116d0-d229-472c-9fbf-ebc229331c3e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train loader:\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n",
+      "\n",
+      "Validation loader:\n",
+      "torch.Size([2, 256]) torch.Size([2, 256])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Train loader:\")\n",
+    "for x, y in train_loader:\n",
+    "    print(x.shape, y.shape)\n",
+    "\n",
+    "print(\"\\nValidation loader:\")\n",
+    "for x, y in val_loader:\n",
+    "    print(x.shape, y.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7b9b1a4-863d-456f-a8dd-c07fb5c024ed",
+   "metadata": {},
+   "source": [
+    "- 另一个可选的检查，以确认标记大小是否在预期的范围内："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "eb860488-5453-41d7-9870-23b723f742a0",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "eb860488-5453-41d7-9870-23b723f742a0",
+    "outputId": "96b9451a-9557-4126-d1c8-51610a1995ab"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Training tokens: 4608\n",
+      "Validation tokens: 512\n",
+      "All tokens: 5120\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_tokens = 0\n",
+    "for input_batch, target_batch in train_loader:\n",
+    "    train_tokens += input_batch.numel()\n",
+    "\n",
+    "val_tokens = 0\n",
+    "for input_batch, target_batch in val_loader:\n",
+    "    val_tokens += input_batch.numel()\n",
+    "\n",
+    "print(\"Training tokens:\", train_tokens)\n",
+    "print(\"Validation tokens:\", val_tokens)\n",
+    "print(\"All tokens:\", train_tokens + val_tokens)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c3085e8-665e-48eb-bb41-cdde61537e06",
+   "metadata": {},
+   "source": [
+    "- 接下来，我们实现一个实用工具函数来计算给定批次的交叉熵损失。\n",
+    "- 此外，我们实现了第二个实用工具函数，用于计算数据加载器中用户指定数量批次的损失。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "7b9de31e-4096-47b3-976d-b6d2fdce04bc",
+   "metadata": {
+    "id": "7b9de31e-4096-47b3-976d-b6d2fdce04bc"
+   },
+   "outputs": [],
+   "source": [
+    "def calc_loss_batch(input_batch, target_batch, model, device):\n",
+    "    input_batch, target_batch = input_batch.to(device), target_batch.to(device)\n",
+    "\n",
+    "    logits = model(input_batch)\n",
+    "    logits = logits.flatten(0, 1)\n",
+    "    loss = torch.nn.functional.cross_entropy(logits, target_batch.flatten())\n",
+    "    return loss\n",
+    "\n",
+    "\n",
+    "def calc_loss_loader(data_loader, model, device, num_batches=None):\n",
+    "    total_loss = 0.\n",
+    "    if num_batches is None:\n",
+    "        num_batches = len(data_loader)\n",
+    "    else:\n",
+    "        # Reduce the number of batches to match the total number of batches in the data loader\n",
+    "        # if num_batches exceeds the number of batches in the data loader\n",
+    "        num_batches = min(num_batches, len(data_loader))\n",
+    "    for i, (input_batch, target_batch) in enumerate(data_loader):\n",
+    "        if i < num_batches:\n",
+    "            loss = calc_loss_batch(input_batch, target_batch, model, device)\n",
+    "            total_loss += loss.item()\n",
+    "        else:\n",
+    "            break\n",
+    "    return total_loss / num_batches"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0691332-84d0-48b3-b462-a885ddeb4fca",
+   "metadata": {},
+   "source": [
+    "- 如果你拥有一台装有支持CUDA的GPU的计算机，大型语言模型（LLM）将在GPU上进行训练，无需对代码做任何更改。\n",
+    "- 通过`device`设置，我们确保数据被加载到与LLM模型相同的设备上。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "56f5b0c9-1065-4d67-98b9-010e42fc1e2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Training loss: 10.98758347829183\n",
+      "Validation loss: 10.98110580444336\n"
+     ]
+    }
+   ],
+   "source": [
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "model.to(device) # no assignment model = model.to(device) necessary for nn.Module classes\n",
+    "\n",
+    "\n",
+    "torch.manual_seed(123) # For reproducibility due to the shuffling in the data loader\n",
+    "train_loss = calc_loss_loader(train_loader, model, device)\n",
+    "val_loss = calc_loss_loader(val_loader, model, device)\n",
+    "\n",
+    "print(\"Training loss:\", train_loss)\n",
+    "print(\"Validation loss:\", val_loss)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43875e95-190f-4b17-8f9a-35034ba649ec",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-1.webp\" width=400px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9339f8d-00cb-4206-af67-58c32bd72055",
+   "metadata": {
+    "id": "b9339f8d-00cb-4206-af67-58c32bd72055"
+   },
+   "source": [
+    "## 5.2 训练一个大型语言模型（LLM）"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "652a4cf4-e98f-46d9-bdec-60e7ccb8d6bd",
+   "metadata": {},
+   "source": [
+    "- 在本节中，我们最终实现了训练大型语言模型（LLM）的代码。\n",
+    "- 我们专注于一个简单的训练函数（如果你对使用更先进的技术增强这个训练函数感兴趣，例如学习率预热、余弦退火和梯度裁剪，请参考[Appendix D](../../appendix-D/03_main-chapter-code))\n",
+    "\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/train-steps.webp\" width=300px>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "Mtp4gY0ZO-qq",
+   "metadata": {
+    "id": "Mtp4gY0ZO-qq"
+   },
+   "outputs": [],
+   "source": [
+    "def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs,\n",
+    "                       eval_freq, eval_iter, start_context):\n",
+    "    # Initialize lists to track losses and tokens seen\n",
+    "    train_losses, val_losses, track_tokens_seen = [], [], []\n",
+    "    tokens_seen, global_step = 0, -1\n",
+    "\n",
+    "    # Main training loop\n",
+    "    for epoch in range(num_epochs):\n",
+    "        model.train()  # Set model to training mode\n",
+    "        \n",
+    "        for input_batch, target_batch in train_loader:\n",
+    "            optimizer.zero_grad() # Reset loss gradients from previous epoch\n",
+    "            loss = calc_loss_batch(input_batch, target_batch, model, device)\n",
+    "            loss.backward() # Calculate loss gradients\n",
+    "            optimizer.step() # Update model weights using loss gradients\n",
+    "            tokens_seen += input_batch.numel()\n",
+    "            global_step += 1\n",
+    "\n",
+    "            # Optional evaluation step\n",
+    "            if global_step % eval_freq == 0:\n",
+    "                train_loss, val_loss = evaluate_model(\n",
+    "                    model, train_loader, val_loader, device, eval_iter)\n",
+    "                train_losses.append(train_loss)\n",
+    "                val_losses.append(val_loss)\n",
+    "                track_tokens_seen.append(tokens_seen)\n",
+    "                print(f\"Ep {epoch+1} (Step {global_step:06d}): \"\n",
+    "                      f\"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}\")\n",
+    "\n",
+    "        # Print a sample text after each epoch\n",
+    "        generate_and_print_sample(\n",
+    "            model, train_loader.dataset.tokenizer, device, start_context\n",
+    "        )\n",
+    "\n",
+    "    return train_losses, val_losses, track_tokens_seen\n",
+    "\n",
+    "\n",
+    "def evaluate_model(model, train_loader, val_loader, device, eval_iter):\n",
+    "    model.eval()\n",
+    "    with torch.no_grad():\n",
+    "        train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter)\n",
+    "        val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter)\n",
+    "    model.train()\n",
+    "    return train_loss, val_loss\n",
+    "\n",
+    "\n",
+    "def generate_and_print_sample(model, tokenizer, device, start_context):\n",
+    "    model.eval()\n",
+    "    context_size = model.pos_emb.weight.shape[0]\n",
+    "    encoded = text_to_token_ids(start_context, tokenizer).to(device)\n",
+    "    with torch.no_grad():\n",
+    "        token_ids = generate_text_simple(\n",
+    "            model=model, idx=encoded,\n",
+    "            max_new_tokens=50, context_size=context_size\n",
+    "        )\n",
+    "        decoded_text = token_ids_to_text(token_ids, tokenizer)\n",
+    "        print(decoded_text.replace(\"\\n\", \" \"))  # Compact print format\n",
+    "    model.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a301b333-b9d4-4eeb-a212-3a9874e3ac47",
+   "metadata": {},
+   "source": [
+    "- 现在，让我们使用上面定义的训练函数来训练大型语言模型（LLM）："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "3422000b-7aa2-485b-92df-99372cd22311",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "3422000b-7aa2-485b-92df-99372cd22311",
+    "outputId": "0e046603-908d-4093-8ae5-ef2f632639fb"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Ep 1 (Step 000000): Train loss 9.558, Val loss 9.856\n",
+      "Ep 1 (Step 000005): Train loss 7.651, Val loss 8.051\n",
+      "Every effort moves you,,,,,,,,,,,,.                                     \n",
+      "Ep 2 (Step 000010): Train loss 6.421, Val loss 6.812\n",
+      "Ep 2 (Step 000015): Train loss 5.913, Val loss 6.559\n",
+      "Every effort moves you, and, and,, and,,,,, and, and,,,, and,,,, and, and,, and,,,, and,, and,,,,, and,,,,,,\n",
+      "Ep 3 (Step 000020): Train loss 5.680, Val loss 6.490\n",
+      "Ep 3 (Step 000025): Train loss 5.557, Val loss 6.602\n",
+      "Every effort moves you, and, and the picture.                             \", and, and the, and the, and, and,\n",
+      "Ep 4 (Step 000030): Train loss 5.204, Val loss 6.508\n",
+      "Ep 4 (Step 000035): Train loss 4.865, Val loss 6.420\n",
+      "Every effort moves you, and I had a a a--I to the picture. \"I. I had the picture. \"I had the picture. I had the the picture. I had the the picture. \"I had the picture. \"\n",
+      "Ep 5 (Step 000040): Train loss 4.332, Val loss 6.328\n",
+      "Every effort moves you, I was a and I was a little to the picture.                                     \n",
+      "Ep 6 (Step 000045): Train loss 4.295, Val loss 6.221\n",
+      "Ep 6 (Step 000050): Train loss 3.268, Val loss 6.184\n",
+      "Every effort moves you know to see the end of the Riv I felt--the a little of the last: \"                               \n",
+      "Ep 7 (Step 000055): Train loss 2.880, Val loss 6.129\n",
+      "Ep 7 (Step 000060): Train loss 2.820, Val loss 6.194\n",
+      "Every effort moves you know the fact of a little a--I was his painting.                                     \n",
+      "Ep 8 (Step 000065): Train loss 2.260, Val loss 6.236\n",
+      "Ep 8 (Step 000070): Train loss 1.754, Val loss 6.260\n",
+      "Every effort moves you know,\" was one of the picture for nothing--I turned Mrs.                                    \n",
+      "Ep 9 (Step 000075): Train loss 1.447, Val loss 6.319\n",
+      "Ep 9 (Step 000080): Train loss 1.120, Val loss 6.310\n",
+      "Every effort moves you?\"               \"I looked--and me.\"         He placed them at my elbow and I looked up his pictures--because he's I had\n",
+      "Ep 10 (Step 000085): Train loss 0.762, Val loss 6.372\n",
+      "Every effort moves you?\"  \"Yes--quite insensible to the irony. She wanted him vindicated--and by me!\"  He laughed again, and threw back his head to look up at the sketch of the donkey. \"There were days when I\n"
+     ]
+    }
+   ],
+   "source": [
+    "torch.manual_seed(123)\n",
+    "model = GPTModel(GPT_CONFIG_124M)\n",
+    "model.to(device)\n",
+    "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, weight_decay=0.1)\n",
+    "\n",
+    "num_epochs = 10\n",
+    "train_losses, val_losses, tokens_seen = train_model_simple(\n",
+    "    model, train_loader, val_loader, optimizer, device,\n",
+    "    num_epochs=num_epochs, eval_freq=5, eval_iter=5,\n",
+    "    start_context=\"Every effort moves you\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "0WSRu2i0iHJE",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 487
+    },
+    "id": "0WSRu2i0iHJE",
+    "outputId": "9d36c61b-517d-4f07-a7e8-4563aff78b11"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnYAAAHWCAYAAAD6oMSKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABw6ElEQVR4nO3dd1gUVxsF8LOFpfeOVBVFVLBiEFuiEXvXxBijppjYjYnRFI2mGY0xRk1MV7+o0WjUGHuJFXsBUREbCChFpXfYne+PgQUUjSAwy3J+zzMPuzOzs+8yoId7596RCYIggIiIiIhqPbnUBRARERFR1WCwIyIiItITDHZEREREeoLBjoiIiEhPMNgRERER6QkGOyIiIiI9wWBHREREpCcY7IiIiIj0BIMdERERkZ5gsCMivRUdHQ2ZTIbQ0FCpSyEiqhEMdkSk02Qy2WOXOXPmSF0iEZHOUEpdABHR48THx2sfr1+/HrNnz0ZkZKR2nZmZmRRlERHpJLbYEZFOc3Jy0i6WlpaQyWTa5w4ODli0aBFcXV1haGiIFi1aYNeuXY88llqtxquvvgofHx/ExMQAAP7++2+0atUKRkZGqF+/PubOnYvCwkLta2QyGX755RcMHDgQJiYm8Pb2xtatW7XbU1JSMGLECNjb28PY2Bje3t5YsWLFI2vYuHEjmjdvDmNjY9ja2qJbt27IysrSbv/ll1/QpEkTGBkZwcfHB99//32Z18fGxmLYsGGwsrKCjY0N+vfvj+joaO320aNHY8CAAVi4cCGcnZ1ha2uLCRMmoKCg4Im/50RUezHYEVGt9e233+Lrr7/GwoULceHCBQQHB6Nfv364du3aQ/vm5eVh6NChCA0NxZEjR+Du7o4jR47glVdewZQpU3D58mX8+OOPWLlyJT7//PMyr507dy6GDRuGCxcuoFevXhgxYgSSk5MBALNmzcLly5exc+dOREREYPny5bCzsyu33vj4eAwfPhyvvvoqIiIicPDgQQwaNAiCIAAA1qxZg9mzZ+Pzzz9HREQEvvjiC8yaNQurVq0CABQUFCA4OBjm5uY4cuQIQkJCYGZmhh49eiA/P1/7PgcOHMCNGzdw4MABrFq1CitXrsTKlSur4ltORLpOICKqJVasWCFYWlpqn7u4uAiff/55mX3atm0rjB8/XhAEQYiKihIACEeOHBG6du0qdOjQQUhNTdXu27VrV+GLL74o8/rff/9dcHZ21j4HIHz00Ufa55mZmQIAYefOnYIgCELfvn2FMWPGPFH9Z8+eFQAI0dHR5W5v0KCBsHbt2jLrPv30UyEwMFBbW+PGjQWNRqPdnpeXJxgbGwu7d+8WBEEQRo0aJXh4eAiFhYXafYYOHSq88MILT1QjEdVuvMaOiGql9PR03LlzB0FBQWXWBwUFISwsrMy64cOHw9XVFf/++y+MjY2168PCwhASElKmhU6tViM3NxfZ2dkwMTEBAPj5+Wm3m5qawsLCAklJSQCAcePGYfDgwTh37hy6d++OAQMGoH379uXW7O/vj65du6J58+YIDg5G9+7dMWTIEFhbWyMrKws3btzAa6+9hjfeeEP7msLCQlhaWmrrvX79OszNzcscNzc3Fzdu3NA+b9q0KRQKhfa5s7MzwsPDH/PdJCJ9wWBHRHqvV69eWL16NY4fP47nnntOuz4zMxNz587FoEGDHnqNkZGR9rGBgUGZbTKZDBqNBgDQs2dP3Lp1Czt27MDevXvRtWtXTJgwAQsXLnzomAqFAnv37sWxY8ewZ88eLF26FB9++CFOnjypDZE///wz2rVr99Driutt3bo11qxZ89Cx7e3tn6heItJvDHZEVCtZWFjAxcUFISEh6Ny5s3Z9SEgIAgICyuw7btw4NGvWDP369cP27du1+7dq1QqRkZFo2LDhU9Vib2+PUaNGYdSoUejYsSOmT59ebrADxJAVFBSEoKAgzJ49Gx4eHti8eTOmTZsGFxcX3Lx5EyNGjCj3ta1atcL69evh4OAACwuLp6qZiPQTgx0R1VrTp0/Hxx9/jAYNGqBFixZYsWIFQkNDy23RmjRpEtRqNfr06YOdO3eiQ4cOmD17Nvr06QN3d3cMGTIEcrkcYWFhuHjxIj777LMnqmH27Nlo3bo1mjZtiry8PGzbtg1NmjQpd9+TJ09i//796N69OxwcHHDy5EncvXtXu//cuXMxefJkWFpaokePHsjLy8OZM2eQkpKCadOmYcSIEfjqq6/Qv39/fPLJJ3B1dcWtW7ewadMmvPfee3B1da38N5OI9AKDHRHVWpMnT0ZaWhreeecdJCUlwdfXF1u3boW3t3e5+0+dOhUajQa9evXCrl27EBwcjG3btuGTTz7B/PnzYWBgAB8fH7z++utPXINKpcL777+P6OhoGBsbo2PHjli3bl25+1pYWODw4cNYvHgx0tPT4eHhga+//ho9e/YEALz++uswMTHBV199henTp8PU1BTNmzfH1KlTAQAmJiY4fPgwZsyYgUGDBiEjIwP16tVD165d2YJHRAAAmSAUjbMnIiIiolqN89gRERER6QkGOyIiIiI9wWBHREREpCcY7IiIiIj0BIMdERERkZ5gsCMiIiLSEwx2FfDdd9/B09MTRkZGaNeuHU6dOiV1SXXW4cOH0bdvX7i4uEAmk2HLli1ltguCgNmzZ8PZ2RnGxsbo1q0brl27Vmaf5ORkjBgxAhYWFrCyssJrr72GzMzMMvtcuHABHTt2hJGREdzc3LBgwYKHatmwYQN8fHxgZGSE5s2bY8eOHVX+eeuCefPmoW3btjA3N4eDgwMGDBiAyMjIMvvk5uZiwoQJsLW1hZmZGQYPHozExMQy+8TExKB3794wMTGBg4MDpk+fjsLCwjL7HDx4EK1atYKhoSEaNmyIlStXPlQPf9+rxvLly+Hn5wcLCwtYWFggMDAQO3fu1G7nOa39vvzyS8hkMu18iwDPq6QEeiLr1q0TVCqV8NtvvwmXLl0S3njjDcHKykpITEyUurQ6aceOHcKHH34obNq0SQAgbN68ucz2L7/8UrC0tBS2bNkihIWFCf369RO8vLyEnJwc7T49evQQ/P39hRMnTghHjhwRGjZsKAwfPly7PS0tTXB0dBRGjBghXLx4Ufjjjz8EY2Nj4ccff9TuExISIigUCmHBggXC5cuXhY8++kgwMDAQwsPDq/17oG+Cg4OFFStWCBcvXhRCQ0OFXr16Ce7u7kJmZqZ2n7feektwc3MT9u/fL5w5c0Z45plnhPbt22u3FxYWCs2aNRO6desmnD9/XtixY4dgZ2cnvP/++9p9bt68KZiYmAjTpk0TLl++LCxdulRQKBTCrl27tPvw973qbN26Vdi+fbtw9epVITIyUvjggw8EAwMD4eLFi4Ig8JzWdqdOnRI8PT0FPz8/YcqUKdr1PK/SYbB7QgEBAcKECRO0z9VqteDi4iLMmzdPwqpIEISHgp1GoxGcnJyEr776SrsuNTVVMDQ0FP744w9BEATh8uXLAgDh9OnT2n127twpyGQy4fbt24IgCML3338vWFtbC3l5edp9ZsyYITRu3Fj7fNiwYULv3r3L1NOuXTvhzTffrNLPWBclJSUJAIRDhw4JgiCeQwMDA2HDhg3afSIiIgQAwvHjxwVBEAO/XC4XEhIStPssX75csLCw0J7H9957T2jatGmZ93rhhReE4OBg7XP+vlcva2tr4ZdffuE5reUyMjIEb29vYe/evULnzp21wY7nVVrsin0C+fn5OHv2LLp166ZdJ5fL0a1bNxw/flzCyqg8UVFRSEhIKHO+LC0t0a5dO+35On78OKysrNCmTRvtPt26dYNcLsfJkye1+3Tq1AkqlUq7T3BwMCIjI5GSkqLdp/T7FO/Dn4unl5aWBgCwsbEBAJw9exYFBQVlvt8+Pj5wd3cvc16bN28OR0dH7T7BwcFIT0/HpUuXtPs87pzx9736qNVqrFu3DllZWQgMDOQ5reUmTJiA3r17P/S953mVFu8V+wTu3bsHtVpd5gcQABwdHXHlyhWJqqJHSUhIAIByz1fxtoSEBDg4OJTZrlQqYWNjU2YfLy+vh45RvM3a2hoJCQmPfR+qHI1Gg6lTpyIoKAjNmjUDIH7PVSoVrKysyuz74Hkt73wUb3vcPunp6cjJyUFKSgp/36tYeHg4AgMDkZubCzMzM2zevBm+vr4IDQ3lOa2l1q1bh3PnzuH06dMPbePvqrQY7IhI50yYMAEXL17E0aNHpS6FqkDjxo0RGhqKtLQ0bNy4EaNGjcKhQ4ekLosqKTY2FlOmTMHevXthZGQkdTn0AHbFPgE7OzsoFIqHRvQkJibCyclJoqroUYrPyePOl5OTE5KSkspsLywsRHJycpl9yjtG6fd41D78uai8iRMnYtu2bThw4ABcXV21652cnJCfn4/U1NQy+z94Xit7ziwsLGBsbMzf92qgUqnQsGFDtG7dGvPmzYO/vz++/fZbntNa6uzZs0hKSkKrVq2gVCqhVCpx6NAhLFmyBEqlEo6OjjyvEmKwewIqlQqtW7fG/v37tes0Gg3279+PwMBACSuj8nh5ecHJyanM+UpPT8fJkye15yswMBCpqak4e/asdp9///0XGo0G7dq10+5z+PBhFBQUaPfZu3cvGjduDGtra+0+pd+neB/+XFScIAiYOHEiNm/ejH///fehbvDWrVvDwMCgzPc7MjISMTExZc5reHh4mdC+d+9eWFhYwNfXV7vP484Zf9+rn0ajQV5eHs9pLdW1a1eEh4cjNDRUu7Rp0wYjRozQPuZ5lZDUozdqi3Xr1gmGhobCypUrhcuXLwtjx44VrKysyozooZqTkZEhnD9/Xjh//rwAQFi0aJFw/vx54datW4IgiNOdWFlZCX///bdw4cIFoX///uVOd9KyZUvh5MmTwtGjRwVvb+8y052kpqYKjo6OwsiRI4WLFy8K69atE0xMTB6a7kSpVAoLFy4UIiIihI8//pjTnVTSuHHjBEtLS+HgwYNCfHy8dsnOztbu89Zbbwnu7u7Cv//+K5w5c0YIDAwUAgMDtduLp1Do3r27EBoaKuzatUuwt7cvdwqF6dOnCxEREcJ3331X7hQK/H2vGjNnzhQOHTokREVFCRcuXBBmzpwpyGQyYc+ePYIg8Jzqi9KjYgWB51VKDHYVsHTpUsHd3V1QqVRCQECAcOLECalLqrMOHDggAHhoGTVqlCAI4pQns2bNEhwdHQVDQ0Oha9euQmRkZJlj3L9/Xxg+fLhgZmYmWFhYCGPGjBEyMjLK7BMWFiZ06NBBMDQ0FOrVqyd8+eWXD9Xy559/Co0aNRJUKpXQtGlTYfv27dX2ufVZeecTgLBixQrtPjk5OcL48eMFa2trwcTERBg4cKAQHx9f5jjR0dFCz549BWNjY8HOzk545513hIKCgjL7HDhwQGjRooWgUqmE+vXrl3mPYvx9rxqvvvqq4OHhIahUKsHe3l7o2rWrNtQJAs+pvngw2PG8SkcmCIIgTVshEREREVUlXmNHREREpCcY7IiIiIj0BIMdERERkZ5gsCMiIiLSEwx2RERERHqCwY6IiIhITzDYVUBeXh7mzJmDvLw8qUuhKsTzqp94XvUPz6l+4nmtWpzHrgLS09NhaWmJtLQ0WFhYSF0OVRGeV/3E86p/eE71E89r1WKLHREREZGeYLAjIiIi0hNKqQuoboWFhTh//jwcHR0hlz9djs3IyAAA3L59G+np6VVRHukAnlf9xPOqf3hO9RPP63/TaDRITExEy5YtoVQ+Prrp/TV2p0+fRkBAgNRlEBERET2VU6dOoW3bto/dR+9b7BwdHQGI3wxnZ2eJqyEiIiKqmPj4eAQEBGgzzePofbAr7n51dnaGq6urxNUQERERVc6TXFLGwRNEREREeoLBjoiIiEhPMNgRERER6Qm9v8aOiIiouqjVahQUFEhdBtVyBgYGUCgUVXIsSYPd4cOH8dVXX+Hs2bOIj4/H5s2bMWDAAO12QRDw8ccf4+eff0ZqaiqCgoKwfPlyeHt7S1c0ERHVeYIgICEhAampqVKXQnrCysoKTk5OkMlkT3UcSYNdVlYW/P398eqrr2LQoEEPbV+wYAGWLFmCVatWwcvLC7NmzUJwcDAuX74MIyMjCSomIiKCNtQ5ODjAxMTkqf8zprpLEARkZ2cjKSkJAJ56ajZJg13Pnj3Rs2fPcrcJgoDFixfjo48+Qv/+/QEA//vf/+Do6IgtW7bgxRdfrMlSiYiIAIjdr8WhztbWVupySA8YGxsDAJKSkuDg4PBU3bI6O3giKioKCQkJ6Natm3adpaUl2rVrh+PHj0tYGRER1WXF19SZmJhIXAnpk+Kfp6e9ZlNnB08kJCQAwEOzLDs6Omq3lScvLw95eXna58X3oCMiIqpK7H6lqlRVP08622JXWfPmzYOlpaV28fX1lbokIiIiohqhs8HOyckJAJCYmFhmfWJionZbed5//32kpaVpl8uXL1drnURERHWZp6cnFi9e/MT7Hzx4EDKZrNpHFK9cuRJWVlbV+h66SGeDnZeXF5ycnLB//37tuvT0dJw8eRKBgYGPfJ2hoSEsLCy0i7m5eU2US0REpNNkMtljlzlz5lTquKdPn8bYsWOfeP/27dsjPj4elpaWlXo/ejxJr7HLzMzE9evXtc+joqIQGhoKGxsbuLu7Y+rUqfjss8/g7e2tne7ExcWlzFx3RERE9N/i4+O1j9evX4/Zs2cjMjJSu87MzEz7WBAEqNVqKJX/HRPs7e0rVIdKpXpszxs9HUlb7M6cOYOWLVuiZcuWAIBp06ahZcuWmD17NgDgvffew6RJkzB27Fi0bdsWmZmZ2LVrl27OYZeRANziaF0iItJNTk5O2sXS0hIymUz7/MqVKzA3N8fOnTvRunVrGBoa4ujRo7hx4wb69+8PR0dHmJmZoW3btti3b1+Z4z7YFSuTyfDLL79g4MCBMDExgbe3N7Zu3ard/mBXbHGX6e7du9GkSROYmZmhR48eZYJoYWEhJk+eDCsrK9ja2mLGjBkYNWpUhRt6li9fjgYNGkClUqFx48b4/ffftdsEQcCcOXPg7u4OQ0NDuLi4YPLkydrt33//Pby9vWFkZARHR0cMGTKkQu9dUyQNdl26dIEgCA8tK1euBCD+cHzyySdISEhAbm4u9u3bh0aNGklZcvmiQ4DFzYG/XgfUvLUMEVFdIwgCsvMLJVkEQaiyzzFz5kx8+eWXiIiIgJ+fHzIzM9GrVy/s378f58+fR48ePdC3b1/ExMQ89jhz587FsGHDcOHCBfTq1QsjRoxAcnLyI/fPzs7GwoUL8fvvv+Pw4cOIiYnBu+++q90+f/58rFmzBitWrEBISAjS09OxZcuWCn22zZs3Y8qUKXjnnXdw8eJFvPnmmxgzZgwOHDgAAPjrr7/wzTff4Mcff8S1a9ewZcsWNG/eHIDYEDV58mR88skniIyMxK5du9CpU6cKvX9N0dnpTmqVeq0BIysgPQ64tAXwGyp1RUREVINyCtTwnb1bkve+/EkwTFRV89/5J598gueff1773MbGBv7+/trnn376KTZv3oytW7di4sSJjzzO6NGjMXz4cADAF198gSVLluDUqVPo0aNHufsXFBTghx9+QIMGDQAAEydOxCeffKLdvnTpUrz//vsYOHAgAGDZsmXYsWNHhT7bwoULMXr0aIwfPx6A2Et44sQJLFy4EM8++yxiYmLg5OSEbt26wcDAAO7u7ggICAAAxMTEwNTUFH369IG5uTk8PDy0vY26RmcHT9QqBkZAu6ILR499C1ThX09EREQ1pU2bNmWeZ2Zm4t1330WTJk1gZWUFMzMzRERE/GeLnZ+fn/axqakpLCwstLfMKo+JiYk21AHibbWK909LS0NiYqI2ZAGAQqFA69atK/TZIiIiEBQUVGZdUFAQIiIiAABDhw5FTk4O6tevjzfeeAObN29GYWEhAOD555+Hh4cH6tevj5EjR2LNmjXIzs6u0PvXFLbYVZU2rwFHFgEJ4UDUIaB+F6krIiKiGmJsoMDlT4Ile++qYmpqWub5u+++i71792LhwoVo2LAhjI2NMWTIEOTn5z/2OAYGBmWey2QyaDSaCu1flV3MT8LNzQ2RkZHYt28f9u7di/Hjx+Orr77CoUOHYG5ujnPnzuHgwYPYs2cPZs+ejTlz5uD06dM6N6UKW+yqiokN0HKk+DhkibS1EBFRjZLJZDBRKSVZqvMOGCEhIRg9ejQGDhyI5s2bw8nJCdHR0dX2fuWxtLSEo6MjTp8+rV2nVqtx7ty5Ch2nSZMmCAkJKbMuJCSkzI0MjI2N0bdvXyxZsgQHDx7E8ePHER4eDgBQKpXo1q0bFixYgAsXLiA6Ohr//vvvU3yy6sEWu6oUOB44/TNwYz+QcBFwaiZ1RURERJXm7e2NTZs2oW/fvpDJZJg1a9ZjW96qy6RJkzBv3jw0bNgQPj4+WLp0KVJSUioUaqdPn45hw4ahZcuW6NatG/755x9s2rRJO8p35cqVUKvVaNeuHUxMTLB69WoYGxvDw8MD27Ztw82bN9GpUydYW1tjx44d0Gg0aNy4cXV95Epji11VsvYEfPuLj48vk7QUIiKip7Vo0SJYW1ujffv26Nu3L4KDg9GqVasar2PGjBkYPnw4XnnlFQQGBsLMzAzBwcEVmv5swIAB+Pbbb7Fw4UI0bdoUP/74I1asWIEuXboAAKysrPDzzz8jKCgIfn5+2LdvH/755x/Y2trCysoKmzZtwnPPPYcmTZrghx9+wB9//IGmTZtW0yeuPJlQ053YNSwuLg5ubm6IjY2Fq6tr9b/h7XPAz88CciUw5QJgWa/635OIiGpMbm4uoqKi4OXlpZvzqtYBGo0GTZo0wbBhw/Dpp59KXU6VeNzPVUWyDFvsqlq9VoBHB0BTCJz8QepqiIiIar1bt27h559/xtWrVxEeHo5x48YhKioKL730ktSl6RwGuyqg0Qg4EJmE+5l54oqgopmqz64EctMlq4uIiEgfyOVyrFy5Em3btkVQUBDCw8Oxb98+NGnSROrSdA4HT1SBSevOY/uFeLzbvREmPucNNHwesGsM3IsEzq0C2k+SukQiIqJay83N7aERrVQ+tthVgW5NHAAAv5+4hQK1BpDLS8Lc9X2PeSURERFR1WGwqwK9mjvDzswQiel52HUxQVzpNwwYvh54ebO0xREREVGdwWBXBQyVCoxo5w4AWBESJa5UGgKNe4itd0REREQ1gKmjiox4xh0GChnOxaQiLDa17MaCHCAtTpK6iIiIqO5gsKsiDuZG6OPnAgBYdSy6ZMP1/cA3zYC/J0pTGBEREdUZDHZVaHR7TwDAPxfuICkjV1xp2xDISQHu3wByUiWrjYiIiPQfg10V8nezQit3KxSoBaw9GSOutPYAxuwAJp8HjK0krY+IiOhpdenSBVOnTtU+9/T0xOLFix/7GplMhi1btjz1e1fVcR5nzpw5aNGiRbW+R3VisKtio4O8AACrT8Qgr1AtrnR/BlBwykAiIpJO37590aNHj3K3HTlyBDKZDBcuXKjwcU+fPo2xY8c+bXllPCpcxcfHo2fPnlX6XvqGwa6K9WzmBEcLQ9zLzMOO8PiyG9UFwJ1QSeoiIqK67bXXXsPevXsRF/fwYL4VK1agTZs28PPzq/Bx7e3tYWJiUhUl/icnJycYGhrWyHvVVgx2VcxAIcfIZzwAACtCoiEIgrghNQb41h9Y2ZvX2hERUY3r06cP7O3tsXLlyjLrMzMzsWHDBrz22mu4f/8+hg8fjnr16sHExATNmzfHH3/88djjPtgVe+3aNXTq1AlGRkbw9fXF3r17H3rNjBkz0KhRI5iYmKB+/fqYNWsWCgoKAAArV67E3LlzERYWBplMBplMpq35wa7Y8PBwPPfcczA2NoatrS3Gjh2LzMxM7fbRo0djwIABWLhwIZydnWFra4sJEyZo3+tJaDQafPLJJ3B1dYWhoSFatGiBXbt2abfn5+dj4sSJcHZ2hpGRETw8PDBv3jwAgCAImDNnDtzd3WFoaAgXFxdMnjz5id+7Mtg/WA2GB7hjyb/XcSEuDediUtHawxqwdAMMzYH02+I9ZDtMlbpMIiKqavlZFX+NwrDkch11IaDOA2RywMD4v4+rMn3it1EqlXjllVewcuVKfPjhh5DJZACADRs2QK1WY/jw4cjMzETr1q0xY8YMWFhYYPv27Rg5ciQaNGiAgICA/3wPjUaDQYMGwdHRESdPnkRaWlqZ6/GKmZubY+XKlXBxcUF4eDjeeOMNmJub47333sMLL7yAixcvYteuXdi3T7x7k6Wl5UPHyMrKQnBwMAIDA3H69GkkJSXh9ddfx8SJE8uE1wMHDsDZ2RkHDhzA9evX8cILL6BFixZ44403nuj79u233+Lrr7/Gjz/+iJYtW+K3335Dv379cOnSJXh7e2PJkiXYunUr/vzzT7i7uyM2NhaxsbEAgL/++gvffPMN1q1bh6ZNmyIhIQFhYWFP9L6VxWBXDWzNDNHf3wUbzsZh5bFoMdjJZOJtxv6eAJz8AXhmPKBUSV0qERFVpS9cKv6aoSuBpgPFx1f+ATaMBjw6AGO2l+yzuDmQff/h185Jq9Bbvfrqq/jqq69w6NAhdOnSBYDYDTt48GBYWlrC0tIS7777rnb/SZMmYffu3fjzzz+fKNjt27cPV65cwe7du+HiIn4vvvjii4eui/voo4+0jz09PfHuu+9i3bp1eO+992BsbAwzMzMolUo4OTk98r3Wrl2L3Nxc/O9//4OpqRhwly1bhr59+2L+/PlwdHQEAFhbW2PZsmVQKBTw8fFB7969sX///icOdgsXLsSMGTPw4osvAgDmz5+PAwcOYPHixfjuu+8QExMDb29vdOjQATKZDB4eHtrXxsTEwMnJCd26dYOBgQHc3d2f6Pv4NNgVW01GFU19sjM8HglpRVOfNB8KmDkBGfHAxY3SFUdERHWSj48P2rdvj99++w0AcP36dRw5cgSvvfYaAECtVuPTTz9F8+bNYWNjAzMzM+zevRsxMTFPdPyIiAi4ublpQx0ABAYGPrTf+vXrERQUBCcnJ5iZmeGjjz564vco/V7+/v7aUAcAQUFB0Gg0iIyM1K5r2rQpFAqF9rmzszOSkpKe6D3S09Nx584dBAUFlVkfFBSEiIgIAGJ3b2hoKBo3bozJkydjz5492v2GDh2KnJwc1K9fH2+88QY2b96MwsLCCn3OimKLXTVpVs8SAZ42OBWdjNUnbuHd4MbibcbavQnsnwscWwr4Dxdb8oiISD98cKfir1GUGgzg01c8huyBdpep4U9XVymvvfYaJk2ahO+++w4rVqxAgwYN0LlzZwDAV199hW+//RaLFy9G8+bNYWpqiqlTpyI/P7/K3v/48eMYMWIE5s6di+DgYFhaWmLdunX4+uuvq+w9SjMwMCjzXCaTQaPRVNnxW7VqhaioKOzcuRP79u3DsGHD0K1bN2zcuBFubm6IjIzEvn37sHfvXowfP17bYvpgXVWFLXbVaEyQJwBg7akY5BYUTX3S5lVAZQYkXRbvSkFERPpDZVrxpfR0WAqluK709XWPO24lDBs2DHK5HGvXrsX//vc/vPrqq9rr7UJCQtC/f3+8/PLL8Pf3R/369XH16tUnPnaTJk0QGxuL+PiSWSFOnDhRZp9jx47Bw8MDH374Idq0aQNvb2/cunWr7MdVqaBWq//zvcLCwpCVVXL9YUhICORyORo3bvzENT+OhYUFXFxcEBISUmZ9SEgIfH19y+z3wgsv4Oeff8b69evx119/ITk5GQBgbGyMvn37YsmSJTh48CCOHz+O8PCqC+oPYrCrRs/7OsLF0gjJWfn4J6zorzhjK6DVKPHxsSWS1UZERHWTmZkZXnjhBbz//vuIj4/H6NGjtdu8vb2xd+9eHDt2DBEREXjzzTeRmJj4xMfu1q0bGjVqhFGjRiEsLAxHjhzBhx9+WGYfb29vxMTEYN26dbhx4waWLFmCzZs3l9nH09MTUVFRCA0Nxb1795CXl/fQe40YMQJGRkYYNWoULl68iAMHDmDSpEkYOXKk9vq6qjB9+nTMnz8f69evR2RkJGbOnInQ0FBMmTIFALBo0SL88ccfuHLlCq5evYoNGzbAyckJVlZWWLlyJX799VdcvHgRN2/exOrVq2FsbFzmOryqxmBXjZQKOUYGegJ4YOqTZ94CZAog6hDntSMiohr32muvISUlBcHBwWWuh/voo4/QqlUrBAcHo0uXLnBycsKAAQOe+LhyuRybN29GTk4OAgIC8Prrr+Pzzz8vs0+/fv3w9ttvY+LEiWjRogWOHTuGWbNmldln8ODB6NGjB5599lnY29uXO+WKiYkJdu/ejeTkZLRt2xZDhgxB165dsWzZsop9M/7D5MmTMW3aNLzzzjto3rw5du3aha1bt8Lb2xuAOMJ3wYIFaNOmDdq2bYvo6Gjs2LEDcrkcVlZW+PnnnxEUFAQ/Pz/s27cP//zzD2xtbau0xtJkgjZt6Ke4uDi4ubkhNjYWrq6uNf7+qdn5eGbefuQWaPDnm4EI8LIRN/z1OhC+QRxQMfiXGq+LiIgqJzc3F1FRUfDy8oKRkZHU5ZCeeNzPVUWyDFvsqpmViQoDW9YDAKw8FlWyof0k8evFTeLkxURERERPicGuBhRPfbL7UiJup+aIK539Aa/OgKAGTvwgXXFERESkNxjsaoCPkwXaN7CFWiPg9+OlRv60L7qtyMWN4n1kiYiIiJ4Cg10NGV3UarfudAxy8ouGcDfsCvReBEw4CSiqZz4bIiIiqjsY7GpI1yaOcLMxRmp2AbaE3hZXymRA29cAY2tpiyMiIiK9wGBXQxRyGUYVTX2ysvTUJ8UEAchOrvnCiIioUqry7gVEVfXzxFuK1aChbdzw9Z6riEzMwPGb99G+gZ244d41YMs4IC8TGH+ctxkjItJhKpUKcrkcd+7cgb29PVQqlfbODUQVJQgC8vPzcffuXcjlcqhUqqc6HoNdDbI0NsDg1vWw+kQMVoRElwQ7Mwcg6QqgzgOSIgBH38cfiIiIJCOXy+Hl5YX4+HjcuVOJe8MSlcPExATu7u6Qy5+uM5XBroaNbu+J1SdisC8iEbHJ2XCzMQGMLIGhKwGn5oB51d0GhYiIqodKpYK7uzsKCwv/856mRP9FoVBAqVRWScsvg10Na+hgjo7edjhy7R7+dzwaH/Yuap3z7iZtYUREVCEymQwGBgYwMOCsBqQ7OHhCAq8GeQEA1p2ORVZe4cM78E4UREREVAkMdhLo3MgenrYmyMgtxKbzt0s2FOQCq/oB37YAUm498vVERERE5WGwk4BcLtPeZmxlSFTJ1CcGRoBMXnSbse+lK5CIiIhqJQY7iQxp7QozQyVu3M3CkWv3Sja0nyR+Pfc757UjIiKiCmGwk4i5kQGGtHYFAKw8Fl2yocFzgGMzoCALOPObNMURERFRrcRgJ6FR7T0hkwH/XklC1L0scaVMVtJqd/JHoDBPugKJiIioVmGwk5CXnSmebewAAFhVutWu2WDA3AXISgIurJemOCIiIqp1GOwkNrpoEMXGs3HIyC0QVyoMgGfGiY+PLQN4P0IiIiJ6Agx2EuvobYcG9qbIzCvExrNxJRtajwYMLYB7kcC1PZLVR0RERLUHg53EZDIZRhdNWLzqWDQ0mqKpT4wsxHAHAMeWSlMcERER1SoMdjpgUMt6MDdSIvp+Ng5dvVuyod1bgFwJ3DoK3D4rXYFERERUKzDY6QBTQyVeaOMGAPgtJKpkg2U9oPlQ8TFb7YiIiOg/MNjpiOKpT45cu4frSRklGwInil8vbwUyEqQpjoiIiGoFBjsd4WZjgm5NHAEAq46Vuk+sUzOg21xg7EHA3Ema4oiIiKhWYLDTIWOKpj7561wc0nIKSjZ0mAo4+0lSExEREdUeDHY6JLCBLRo7miM7X40NZ2LL36kwv2aLIiIiolqDwU6HiFOfeAIAVh2Phrp46hMAyEkB/p4ALGkJFORKUyARERHpNAY7HTOgRT1YmRggNjkH+yMSSzaozIAbB4H0OODabsnqIyIiIt3FYKdjjFUKvNjWHQCwsvT9YxUGQO+FwKt7AN/+0hRHREREOo3BTgeNDPSAXAYcu3EfkQmlpj5p3BNwbyddYURERKTTGOx0UD0rYwQ3Fac2WXksqvydctNqsCIiIiKqDRjsdNSYovvHbj5/GylZD4yE3fMR8LUPEHdGgsqIiIhIVzHY6ai2ntbwdbZAboEG6x+c+iQ7GSjIBo4tkaY4IiIi0kkMdjpKJpNhTNHUJ/87Fo1CtaZkY/tJ4teIf4DkmzVfHBEREekknQ52arUas2bNgpeXF4yNjdGgQQN8+umnEAThv1+sB/r6u8DGVIU7abnYe7nU1CcOTYCGzwOCBjj+vXQFEhERkU7R6WA3f/58LF++HMuWLUNERATmz5+PBQsWYOnSpVKXViOMDBR4KUCc+mRF6alPACBosvj1/Gog637NFkZEREQ6SaeD3bFjx9C/f3/07t0bnp6eGDJkCLp3745Tp05JXVqNefkZDyjlMpyKSsalO6VGwnp2BJz9gcIc4ORy6QokIiIinaHTwa59+/bYv38/rl69CgAICwvD0aNH0bNnT4krqzlOlkbo2dwZALAyJLpkg0wGdJgmPg75FkiKqPniiIiISKfodLCbOXMmXnzxRfj4+MDAwAAtW7bE1KlTMWLEiEe+Ji8vD+np6dolIyPjkfvWFqPbewIA/g67g/uZeSUbfPsDjXoA6nxgy3hAXShNgURERKQTdDrY/fnnn1izZg3Wrl2Lc+fOYdWqVVi4cCFWrVr1yNfMmzcPlpaW2sXX17cGK64erdyt4O9qifxCDf44FVOyQSYD+nwDGFoCd84Bx+vGtYdERERUPpmgw0NM3dzcMHPmTEyYMEG77rPPPsPq1atx5cqVcl+Tl5eHvLySVq3bt2/D19cXsbGxcHV1rfaaq8vm83F4e30YHC0McXTGczBQlMrk59cAf48HFCrgzSOAg490hRIREVGViouLg5ub2xNlGZ1uscvOzoZcXrZEhUIBjUbziFcAhoaGsLCw0C7m5ubVXWaN6NXcGXZmhkhMz8POiwllN7Z4SZz+RJ0vBjx2yRIREdVJOh3s+vbti88//xzbt29HdHQ0Nm/ejEWLFmHgwIFSl1bjDJUKvPyMOPXJypAH7h8rkwF9vwUMLQCZHMhJkaBCIiIikppS6gIeZ+nSpZg1axbGjx+PpKQkuLi44M0338Ts2bOlLk0SL7Vzx3cHruNcTCrCYlPh72ZVstGyHvDqbsC+MSBXSFYjERERSUenW+zMzc2xePFi3Lp1Czk5Obhx4wY+++wzqFQqqUuThIO5Efr4uQAAVj44YTEAOPoy1BEREdVhOh3s6GHFU59su3AHSRm55e9UkAPs+Yi3GyMiIqpjGOxqGX83K7Ryt0KBWsDakzHl7xTxD3BsKbB/LpCRWP4+REREpHcY7Gqh0UFeAIDVJ2KQV6h+eIfmQwG/F4GhKwFzx5otjoiIiCTDYFcL9WzmBEcLQ9zLzMOO8PiHd5DJgEE/Ao3rzq3XiIiIiMGuVjJQyDHyGQ8AwIqQaPznHNPp8UDqI7ptiYiISG8w2NVSwwPcoVLKcSEuDediUh+94/X9wPftgE1vAo+Z2JmIiIhqPwa7WsrWzBD9/cWpTxbvu/roVjvbhuKdKGKOAad/rsEKiYiIqKYx2NVi459tCEOlHEeu3cP607Hl72TtAXT/RHy8bw6QfLPG6iMiIqKaxWBXi3nZmWJ6cGMAwGfbI3A7Naf8HVu/Cnh2BAqyga2T2SVLRESkpxjsarkxQV5o42GNzLxCzPzrQvldsnI50G8pYGACRB8Bzvxa84USERFRtWOwq+UUchkWDPHTdsmue1SXrI0X0G2u+Hjvx0BKdI3VSERERDWDwU4P1Lc303bJfv64Ltm2rwMeQUBBFvD3RHbJEhER6RkGOz1Rukt2xsbHdMn2XwYojcUu2bMrar5QIiIiqjYMdnpCIZfhq6H+MDKQ4+j1e1h76hETEtvUB7rNER/vnQ2k3KqxGomIiKh6MdjpEXGUrA8A4IvtEYhNzi5/x4CxgHt7ID8T+Gcy8F93riAiIqJagcFOz4xp74m2ntbIyldj5qb/6pI1Am4eBG78W+N1EhERUdVjsNMzcrkMXw0Ru2RDrt/HmpOP6JK1bQD0XAAMXQU07FqzRRIREVG1YLDTQ552pnivqEt23o7HdMm2HgU0HVBzhREREVG1YrDTU6PbeyLA0wZZ+WrM+OsCNJr/uI4u867YLUtERES1FoOdnpIXTVxsZCDHsRv3seZRo2QB4O5V4LsAYP1IIC2u5ookIiKiKsVgp8c87Uwxo8cTdMnaNhAXK3cgP6sGKyQiIqKqxGCn50YFeiLAywbZ+Wq8t/ERXbJyBfDCauCNA4B945ovkoiIiKoEg52eE0fJ+sHYQIHjN+9jzclHTEhs7gQoVSXPObcdERFRrcNgVwd42JpiRg+xJW7eziuP7pIFAHUBcHA+sP5lhjsiIqJahsGujngl0BPtirpkp28Me/Qo2ZRo4MhC4Mo24ML6Gq2RiIiIng6DXR1RPHGxsYECJ24mY/WjumTtvIEuM8XHO98DMhJqrkgiIiJ6Kgx2dYi7rQlm9iweJXsFMfcf0SXbfgrg3ALITQP+mcouWSIiolqCwa6OGfmMB56pb4Ocgsd0ySqUwIDlgNwAuLoTCN9Q84USERFRhTHY1TFyuQwLBvvDRKXAyahk/H7iEV2yjr5Alxni4x3TgYzEmiuSiIiIKoXBrg5ytzXB+0Vdsl/uvIJb9x8xKXHQVMDZH8hNBbZPY5csERGRjmOwq6NGtPNAYH3boi7ZR0xcrDAA+n8vdsle2QZc/KvmCyUiIqInxmBXRxXfS9ZEpcCpqGSsOh5d/o5OzYBO08XHO6YDmUk1ViMRERFVDINdHeZmY4L3ezUBAMzfdQXR9x7RJdtxGuDUHMhJZpcsERGRDmOwq+NGBLijfQNb5BZoHn0vWW2XrBKI+Ae4tLnmCyUiIqL/xGBXx8nlMswf7AdTlQKnopOx8lh0+Ts6+wEd3wXsGgNWHjVaIxERET0ZBjsq0yW7YPcVRD2yS/Yd4M3DgGvrGqyOiIiInhSDHQEARrRzR1DD4i7ZR0xcrFQBBkYlz+POAJl3a65IIiIieiwGOwIAyGQlXbKno1Ow4lFdsgCgLgSOLgZW9AJ+H1BDFRIREdF/YbAjLVdrE3zQW+yS/epxXbIAIJMDDk2Axj1L1uVnA8s7AHs+Am4dBzTqaq6YiIiISlNKXQDplpcC3LEzPAFHr9/D9A1hWP9mIBRyWdmdFEogaLK4aDQl628eBBLDxeXYUsDEFmjUQwx/DZ4DVKY1+lmIiIjqGrbYURkymQxfDm4OM0MlztxKwYqQqMe/QF7qR8irIzB0JeD3AmBkBWTfB0LXAOtfBhbUB9a+AJxdyfvOEhERVROZIOj3bLNxcXFwc3NDbGwsXF1dpS6n1lh7MgYfbA6HoVKOnVM6or69WcUOoC4AYk4AkTuAK9uB1Ftlt9drA/j0Anz6APaNq65wIiIiPVORLMMWOyrX8AA3dPS2Q16hBtM3XoC6vFGyj6MwEFvweswDpoQB444Dz80C6hVNlXL7DLD/E2DXzLKv43V5RERElcZgR+USu2T9YGaoxNkn6ZJ9/MEAR1+g07vAG/8C70QCfRYD3sGA74CS/dLvAF81BDaPY8AjIiKqBA6eoEeqZ2WMD3s3wfubwvHV7kg86+OABhXtki2PuRPQZoy4lHZtj3g/2vvXAbmiZP2V7YBLK8DC+enfuzY4vxpICAcCxgK2DcR1cWeA078AFvUAy3qApVvRY1fAyELaeomISGcw2NFjvdjWDTvC43HkmjhKdsNb7R8eJVtVWrwM2HoDmsKSdTkpwPqRgKAGXFoCjXuJX03tAFN7wMSu7KTJuiwnFUiJApKjSn2NFj/vq7tK9ju7Cog7Bbi1Kwl2CReAsD/KP66hhRjwioOeZT3AwhWwcgM8O1T3pyIiIh3CYEePVdwlG/zNYZyLScWvR29ibKcG1fNmCiXgGVR2XUaieF1e3GngznlxeZDKvCTomdqJS9c5gKmtuD05CsjPFEOPsXX11F4sNw1IuFhOgIsSQ2p5ZHKgMF+8swcANBsMuD8D2NQv2ce1LdD1YyAtDki/DaTdBtJigdxUIC8dSLosLqWZOgDTr5U83/4OkJkIdJgG1GtVUm9+NmDmWHaEMxER1UoMdvSf6lkZ46PeTTBzUzgW7rmK53wc0dChCrpkn4SDD/D6XiAzCbi6C7i6Wxxhm3VPXDQFQH6GuKSUug6w29ySx8eWAmd+BTq9Bzz3obguJRrY/JY4156pfaml1HMTO8DEpmy3cGmX/xZH/jYdCLgFiOuiDovTuzyKqQNg4wVYe5X9KisVqp556+HXOTUXlwflZRYFvbhSoa/osZFl2X1vHACSbwDtSh3/4iZg21RAbiB2dVu4lmr1qwdYuIiLuYv4PWH4IyLSaQx29EReaOuGHRcTcPjqXUzfGIaN1dklWx4zB6DVK+JSTBDEFqese0D2PSDrbtFyX5xHr5jSSAwlZg4l69LjgZjjT/DGsqLwZweo84GJZ0qC3uWtwMWN4jWDxcHOpj5g7flwcLP2EtcbVnEgNjQTp4t5kiljenwpBlp7n5J1ualiqNQUAKkx4vIociVg5gQ4+wHDS3UL3zgAKFRi8OT1fkREkuI8dvTE7qTmIPibw8jIK8T7PX3wZudq6pKtCVn3gOgjJS1/WXeLwmFxQLwnDuR40JQLgLWH+Dh8o9g13LjXw13ItYm6EMhMeLjVL/2OuGTEi124QtFdRpxbAG8eKnn9kpZA8k1g9I6S78OlLeIgEAtnsbVP2/LnLH41thZHSxMR0X+qSJZhix09MRcrY3zUpwlm/BWOr/deRdcmDmjoYC51WZVjaid2oT6OulC8e0Zxa6DSSLwWrVjzIeJS2ymURd2vj/nHQl0ohruM+JKAV8ymKOBb1itZlxAOXN/76OMpjUpCnrmzGADtGgOtRpbsIwgMf0REFcQWO6oQQRAwesVpHLp6F3ZmKnzU2xf9W7hAxv+AqbTES8Dts2KXd8Yd8Wv6HfFx9v3yX+PSChh7oOT50jZAQTYwfJ3Y/QuIA1PuXy8ZBWzm8OhrIImI9ARb7KjayGQyzB/sh5d/PYnrSZmYuj4UG87G4tP+zSp+2zHSX45NxaU8hXliy1/prt70ePFaxWKCIF7vp84re93epU3Aka9LnsuVRS1+RQM9iqd6Kf2Ygz6IqCoU5gHZyeIfpznJJY89OwL2jaSuTovBjirMydIIOyZ3xE+Hb2Dpv9cRcv0+eiw+gnFdGmBclwYwMmALCj2G0rBogInn4/ebEiYGP4tSf52aOwOuASWBUFMoTvuSFvvo47gGiCOrix39BjAwFbvRTWye5pMQUW0kCGJvQHayOA1VcY8AAFzYIE6v5du/5JrhqMPA2heBgqzyj9dnMYMd1X4qpRwTn/NGX38XzPr7Eg5fvYtv91/D36G38emAZujobS91iVSbyWRF0688cLeRgDfEBSi57i/9DpAeJ87tV+bxbSAjATAvdV2kIAAH5oktgY26lwS7g/PFCaCtPQArj6Lg6QFYeYqPTWx4vR9RseIZCXJSxJarnBQgO6Vkrs52Y0v2Pb9GHIzl21+cvgoA7t8Qp66SK8RR+XIFIFM88LWc9Y2CS34Pk66I721Tv6S1P+s+cOfcA61q94sCXHELW9E6dV5JjR/dLZlH9NoeIPzPognei4KdyrQk1MkU4uAvExtxxgRjG7GHQIcw2NFT8bA1xaoxbbEjPAFz/7mE6PvZGPnrKfTzd8FHfZrAwbyW3BWCah+Fsuj2avUAtC1/H3WB+Jd5scI88VZ2aXHiaN1iyTfFeRBLz4VYmsqsbOCz9hSnd/FoX0UfhkhiiZfEP5ScW5T8wRN1GAhdW9KyVRziclLFuwGVx8SubLALXQPcChFbtIqDXXwYsPv9itf4cWrJ44NfiHOJ9lpY8sfenfPAmgoMaFOoxGCWlwEoiya0b9xTDHX12pTs59AUmHRO/L4YWur8pR0MdvTUZDIZevs5o1MjO3y95yr+dzwaW8Pu4EBkEt7r4YOXAtxrds47omIKA0BRaqJmAyOg5/yH9+s2RxyRm3JLnAA75ZY451/qLbHLNz8TSLokLsWa9C0JdoIArOwjthz0Xlhyh5PcdMDARAyhRFUhNx0ozBUnIFcaiusyEsQ/TgrzxEVd9DUvoyiIFS3agJYCGFsBr+0pOe6mN4HEcODlv4CG3cR1KbcefStDQPzZNrYRf96NrcTgY/pAb03jXoCdtziXZzFLV/EOOxq1GBA1mqKv6lJfNWWf44FR8ubOgG3DsncTMncCnPzEljQTG7G2Mo9tyq5XmT7cEt9skLiU+ZxGJbd3rAU4Kpaq3IW4VHy4+SLCb6cBAPzdrPD5gGZoVs/yP15JpIMKcsVr+FKiS8Jeyi0x1D0zTtwnIxH4upHYffRhYkm3zqY3gfAN4n9kxS19Vg98NbVjN29tpS4UQ39+pvi89JRB4RvFAOU3rOQuMFd3i3fQKQ5ghbnixOeFueJtBct7btsQeG13yXGXtBLvIDNmF+ARKK47sRzYNbNitZvYAu/dLHm+YQxwNxII/hxo8Ky4LilCrNmkOLxZlwpy1rXnPt16gKNiSVJ+rlbYMiEIq0/cwle7IxEWm4p+y45iTJAX3n6+EcwM+WNHtYiBkdjiYOf96H1UpsDQVeKch8WhDhCv8xPUYhhMvSV2bT1IaSyO/FWZFg3qGAx0eFvclp8F7J0ttox0m1MytUvMCXESbZWJ2E2sMhX3UZmJ6wxMGBbLo9GId1kpbukSBHGi8rxMoGHXkvVXdgDRR8VbFeYVBbe8zIefF+aUHNujAzBme8nzne8VjZjsUBLs7pwHzvxWsZofvL91cY2lrxEzsRMDoMJQ3F68GJgCJg8EsuKQZmJb9rhDVzz83g5NxIVqFf4PS9VCIZdhVHtP9GjmhE+2Xcb2C/H49WgUtl+Ix5x+vghu6sS570h/GJoBTQc8vP6VreJdPbRdvNFlu3nT74jhILNUQPDqWPI4Nw04/Ys4rcvzn5SsP7YUuLLtMQXJSoU9UzHwNe4BPPeRuFmjAXa8KwaOju+U3Oru/g0xtBhbi7flMzTX7YCYlyHeR7r4gvjse+LXrHtF6x54npcO+PQGXlxTcozfB4qjq9++XDLJdvRR4MR3T16H3ODh75N3dzGYFwcxQAx5Xd4X15UJYUbi9V5Ko7LBTGEonr/S3jws/jyUfj+/oeJCBAY7qmaOFkb47qVWGNo6CbP/voSY5Gy8tfocuvo4YE6/pnCzMZG6RKLqI5eX3E6tuNustMI8MdzlZ4ohID9TnJOvmIEx0HmmGDxK/0du21CcxiU/Sxytl1+0aAeKCCVdhMUzNDj7l7y+IBs486v4uNO7JeuPLAJCV5c8lynE8Fd8DZWRVamvRevsfQDv50tek34HMLQo//qlxynMLxvODC2Aeq1Kvk+bxorrR2wQvy8AsGP6468BK09eRqnPJxMHCwiasndU8eokXhepMhdDr8qs6OsjnpcOb8UG/vDwOs8O4vI0FAZP93rSewx2VCO6NHbAnrdtsezf6/jx8A3sv5KEkBv3MKVrI7ze0QsGCt0eZURULZSGgI3Xo7cbWwPPljN68Pm55e+v0YihrUzgyxYDnplDyX4yOdB5hhhyDEr9caUyEW+bl5MqdvUJ6qKRkOXcN7mYT5+SYCcIwOLmD7eAnfwRuLa3KBRaijVl3y8V5Ipa00pr0hd4oShkKlTAle1iN2r2/ZJr2UxsxZBlYiN2R5rYioup3cPrTGzF9zd84DaIb+x/+DM17iEuRLUQB09QjbuelIEPN1/EySjxP4tGjmb4fGBztPXkZLFEOqMgRwx4uaklU1zkpopfc1JKHru0BALHi6/JzwK+9BAD2Ad3SroRt4wXp734LzJFSQir37nsCOazq8TjNQouCWcajc5PPUFUFSqSZXQ+2N2+fRszZszAzp07kZ2djYYNG2LFihVo06bNf78YDHa6ShAE/HXuNr7YEYHkrHwAwAtt3DCzpw+sTVX/8Woi0lnFs/qXHsARdwa4e6UoHKaJXammxa1pRV9NbWvFHGFEUtCbUbEpKSkICgrCs88+i507d8Le3h7Xrl2DtbX1f7+YdJpMJsOQ1q7o6uOA+buuYN3pWKw/E4s9lxPwQa8mGNLalYMriGojmezhC/5d24gLEVU7nW6xmzlzJkJCQnDkyJFKH4MtdrXDmehkfLj5IiITxQubA7xs8MXAZmjoYP4fryQiItJvFckyOt3mvXXrVrRp0wZDhw6Fg4MDWrZsiZ9//vmxr8nLy0N6erp2ycjIeOz+pBvaeNpg2+QOmNnTB0YGcpyKSkbPb4/gq91XkJP/iFvXEBERURk6Hexu3ryJ5cuXw9vbG7t378a4ceMwefJkrFq16pGvmTdvHiwtLbWLr69vDVZMT8NAIcdbnRtg79ud0dXHAQVqAd8duIHuiw/hQGSS1OURERHpPJ3uilWpVGjTpg2OHTumXTd58mScPn0ax48fL/c1eXl5yMsrmZH79u3b8PX1ZVdsLSMIAnZfSsTcfy4hPi0XANCruRNm92kKJ0vexoaIiOoOvemKdXZ2fqjFrUmTJoiJiXnkawwNDWFhYaFdzM15jVZtJJPJ0KOZE/ZO64zXO3hBIZdhR3gCui06hAW7ruBMdDIK1Zr/PhAREVEdotOjYoOCghAZGVlm3dWrV+Hh4SFRRVTTzAyV+KiPLwa2qocPNl9EWGwqvj94A98fvAFLYwN08LZDl0b26NzIHg4WbMkjIqK6TaeD3dtvv4327dvjiy++wLBhw3Dq1Cn89NNP+Omnn6QujWpYUxdLbBrXHtvD47HnUgKOXLuHtJwCbL8Qj+0X4gEAvs4W6NJYDHmtPKx5NwsiIqpzdPoaOwDYtm0b3n//fVy7dg1eXl6YNm0a3njjjSd+Pac70U+Fag3C4tJwKDIJB6/exYW4tDLbzQ2V6OBth86N7NG5sT2cLY0lqpSIiOjp6NWdJ54Wg13dcC8zD0eu3cXByLs4fPUuUrILymz3cTLXhrw2HjZQKdmaR0REtQODXSkMdnWPWiMg/HYaDkYm4WDkXYTFpaL0T7mpSoGghnbo3NgeXRo7oJ4VW/OIiEh36c0txYgqQyGXoYWbFVq4WWFqt0ZIycrH4Wt3cSjyLg5fu4t7mfnYczkRey4nAgC8HczQuZEY8tp6WcNQqZD4ExAREVUOgx3pPWtTFfq3qIf+LepBoxFw6U46DkYm4dDVuzgXk4JrSZm4lpSJX45GwdhAgfYNbNGlqDXPzcZE6vKJiIieGLtiqU5Lyy7Aketia96hq3eRlJFXZnt9O1Ntl20bD2uYGvJvISIiqlnsiiV6QpYmBujj54I+fi4QBAGX49NxsCjknb2Vgpv3snDzXhZWhEQDAJwsjFDf3hReduLSwN4MXnamcLU2hpLTqxARkcQY7IiKyGQyNHWxRFMXS0x4tiHScwsQcu0eDl0Vg158Wi4S0sXl2I37ZV5roJDB3cYEXnZmqG9vivpFwc/L3hT2ZoaQyWQSfSoiIqpLGOyIHsHCyAA9mzujZ3NnAEBqdj5u3stC1N0s3LyXiah7Wbh5NwtR97KQV6jBjbtZuHE3C4goexxzQyW8SrXy1bc3Q307U3jamcKMXbtERFSFKvW/SmxsLGQymbaf99SpU1i7di18fX0xduzYKi2QSFdYmajQyl2FVu7WZdZrNALi03MRdTcLUfcycaMo7EXdy0JcSjYy8gpxIS7toUmUAcDRwrAo8Ilhr7ib183GhHfOICKiCqtUsHvppZcwduxYjBw5EgkJCXj++efRtGlTrFmzBgkJCZg9e3ZV10mks+RyGepZGaOelTE6eNuV2ZZXqEbM/Wyxpe9eFm7ezdSGvnuZ+UhMz0Nieh5O3Ewu8zqlvLhr1xT+blZ4sa0b74VLRET/qVLB7uLFiwgICAAA/Pnnn2jWrBlCQkKwZ88evPXWWwx2REUMlQp4O5rD29H8oW1p2QWIui+28kXdzcKNom7eqHtZyClQawdu7L+ShCX7r6G3nzNGt/dEywdaDImIiIpVKtgVFBTA0NAQALBv3z7069cPAODj44P4+Piqq45Ij1maGKCFiTiRcmmCICChqGv3+t1MbAuLx6noZPwdegd/h96Bv5sVxrT3RK/mzrw1GhERlVGp/xWaNm2KH374AUeOHMHevXvRo0cPAMCdO3dga2tbpQUS1TUymQzOlsZo39AOrwR64s+3ArFtUgcMbe0KlVKOsNhUTF0fiqD5/2LxvqtIysiVumQiItIRlZqg+ODBgxg4cCDS09MxatQo/PbbbwCADz74AFeuXMGmTZuqvNDK4gTFpE/uZ+Zh3elY/H78FhLSxUBnoJChj58LRrf3hP8DrX9ERFT7VSTLVPrOE2q1Gunp6bC2LrneJzo6GiYmJnBwcKjMIasFgx3powK1BrsvJWBlSDTO3ErRrm/pboXR7T3Rsxm7aYmI9EW133kiJycHgiBoQ92tW7ewefNmNGnSBMHBwZU5JBFVgIFCrr1jRnhcGlYei8Y/YXdwPiYV52NC8bl5BEa088BL7dxhb24odblERFRDKtVi1717dwwaNAhvvfUWUlNT4ePjAwMDA9y7dw+LFi3CuHHjqqPWSmGLHdUV9zLz8MfJGPx+4pb2nrcqhRx9/JwxOsgTfq5W0hZIRESVUpEsU6m+mnPnzqFjx44AgI0bN8LR0RG3bt3C//73PyxZsqQyhySip2RnZohJXb1xdMZzWDK8JVq5WyFfrcGm87fRb1kIBn0fgq1hd1Cg1khdKhERVZNKdcVmZ2fD3Fycl2vPnj0YNGgQ5HI5nnnmGdy6datKCySiilEp5ejn74J+/i4Ii03FqmPR+OfCHZyLScW5mPNwtDDEy+08MLydO+zM2E1LRKRPKtVi17BhQ2zZsgWxsbHYvXs3unfvDgBISkqChYVFlRZIRJXn72aFRS+0QMjM5/B2t0awNzdEYnoevt57Fe3n/Yt3/gxDeDm3OiMiotqpUsFu9uzZePfdd+Hp6YmAgAAEBgYCEFvvWrZsWaUFEtHTczA3wpRu3giZ8Ry+fbEFWriJ3bR/nYtD32VHMWT5MWy7wG5aIqLartLTnSQkJCA+Ph7+/v6Qy8V8eOrUKVhYWMDHx6dKi3waHDxBVL7zMSlYdSwa28PjUaAW/xlwsjDCy8+4Y3iAO2zZTUtEpBNqZB670m8GQGdDE4Md0eMlpedizckYrDkZg3uZRaNplXL09XPBK4EenPSYiEhi1T4qVqPR4JNPPoGlpSU8PDzg4eEBKysrfPrpp9Bo2JVDVJs4WBjh7ecbIWTms1j8Qgv4u1oiv1Dspu3/XQj6Lj2KP0/HIidfLXWpRET0Hyo1KvbDDz/Er7/+ii+//BJBQUEAgKNHj2LOnDnIzc3F559/XqVFElH1M1QqMKBlPQxoWQ/nYlKw+vgtbLsQj/DbaXjvrwv4bPtlDG7tipef8UADezOpyyUionJUqivWxcUFP/zwA/r161dm/d9//43x48fj9u3bVVbg02JXLFHlJWflY8OZWKw5GYOY5Gzt+vYNbPHyMx543tcRBgreuoyIqDpV+y3FkpOTyx0g4ePjg+Tk5Mockoh0kI2pCm92boA3OtbH4Wt3sfpEDP69kohjN+7j2I37cDA3xIsB7hge4AZnS2OpyyUiqvMq9ae2v78/li1b9tD6ZcuWwc/P76mLIiLdIpfL0KWxA34Z1QZHZjyHic82hJ2ZCkkZeViy/xo6zD+AN38/gyPX7kKjearxWERE9BQq1RV76NAh9O7dG+7u7to57I4fP47Y2Fjs2LFDe7sxXcCuWKLqkV+owe5LCVh94hZORpW01HvZmWJEO3cMae0KKxOVhBUSEemHah8V27lzZ1y9ehUDBw5EamoqUlNTMWjQIFy6dAm///57pYomotpFpZSjr78L1r8ZiD1vd8KoQA+YGSoRdS8Ln22PQLsv9uPdDWEIjU3FU86qRERET+ip57ErLSwsDK1atYJarTvTIrDFjqjmZOUV4u/QO1h94hYux6dr1zerZ4GRz3ign389GKsUElZIRFT7VHuLHRFReUwNlXipnTu2T+6Av8a1x6CW9aBSynHxdjpm/BWOgC/2Ye4/l3A9KVPqUomI9FKlRsUSET2OTCZDaw9rtPawxkd9fMtMmbIiJBorQqIRWN8WIwM5ZQoRUVVisCOialV6ypQj1+/h9+O38O+VRBy/eR/HbxZNmdLWDcPbuXPKFCKip1ShYDdo0KDHbk9NTX2aWohIj8nlMnRuZI/OjexxOzUHf5yMwbrTseKUKf9ex3cHb6CrjwNeCfREUENbyGQyqUsmIqp1KhTsLC0t/3P7K6+88lQFEZH+q2dljHeDG2NyV+8yU6bsuZyIPZcT8byvIz4b0AyOFkZSl0pEVKtU6ahYXcRRsUS1w9XEDKw+cQtrT8agUCPA3EiJD3s1wQtt3dh6R0R1GkfFElGt08jRHJ/0b4ZtkzvA39USGbmFmLkpHCN+OYlb97OkLo+IqFZgsCMineLjZIFN44PwUe8mMDKQ49iN+whefBi/HLkJNW9XRkT0WAx2RKRzFHIZXu9YH7undkJgfVvkFmjw2fYIDFp+DJEJGVKXR0SksxjsiEhnediaYu0b7fDloOYwN1QiLDYVfZYewTd7ryKvUHfucENEpCsY7IhIp8lkMrwY4I690zqjWxNHFKgFfLv/GvouPYrzMSlSl0dEpFMY7IioVnCyNMLPr7TGspdawtZUhauJmRi0/Bg+3XYZ2fmFUpdHRKQTGOyIqNaQyWTo4+eCfdM6Y1DLehAE4NejUQhefBhHr92TujwiIskx2BFRrWNtqsKiF1pg5Zi2qGdljNjkHLz860m8tzEMadkFUpdHRCQZBjsiqrW6NHbA7rc7YVSgBwDgzzNx6PbNIey6mCBxZURE0mCwI6JazcxQibn9m2HDW4Gob2+Kuxl5eGv1WYxfcxZJGblSl0dEVKMY7IhIL7T1tMGOyR0xvksDKOQy7AhPwPOLDmPj2Tjo+Z0TiYi0GOyISG8YGSjwXg8f/D0hCE1dLJCWU4B3N4Thld9OITY5W+ryiIiqHYMdEemdZvUssWVCEGb08IFKKceRa/cQvPgwVoZE8bZkRKTXGOyISC8ZKOQY16UBdk7piABPG2TnqzHnn8sY+sMxXE/ibcmISD8x2BGRXmtgb4Z1Y5/BpwOawVSlwLmYVPT69iiW7r+GArVG6vKIiKoUgx0R6T25XIaRz3hgz7TO6NLYHvlqDb7eexV9lx7FhbhUqcsjIqoyDHZEVGfUszLGitFtsfiFFrA2McCVhAwM+C4E83ZEICdfLXV5RERPTSl1AURENUkmk2FAy3ro4G2Huf9cxj9hd/Dj4ZvYdiEe3o5mMFUpYaxSwFSlgImhEiYGRV9VCpioFDBVFT1+YJ2xSgFDpRwymUzqj0hEdRiDHRHVSXZmhlg6vCX6+bvgoy3huJ2ag9upOU91TIVcVhQEFaUCYtFXQwVMikOhqiQU2psbokczJxgqFVX0yYioLmOwI6I67XlfRzxT3wYh1+8hPbcQ2XmFyC5QIztPjex8NbLzC7Vfs/LURdsKy2zLKxQHYag1AjLyCpGRVwgg74lr6NLYHr+OaguFnK19RPR0GOyIqM4zNzJAj2bOlX59oVqD7AI1cvLVyNKGvpLgl5VXiJwCNbLy1MjJL0RW0fasvELsuZyAg5F3sWhvJKYH+1ThpyKiuojBjojoKSkVclgo5LAwMqjwa/8OvY0p60Lx3YEbaOZiiZ7NKx8wiYg4KpaISEL9W9TD6x28AADvbAhDZAInTyaiymOwIyKS2MyePghqaIvsfDXG/n4GadkFUpdERLUUgx0RkcSUCjmWDm+FelbGuHU/G1PWn+c9bYmoUhjsiIh0gI2pCj+90hpGBnLtYAoiooqqVcHuyy+/hEwmw9SpU6UuhYioyjV1scT8wX4AgO8O3MDO8HiJKyKi2qbWBLvTp0/jxx9/hJ+fn9SlEBFVGw6mIKKnUSuCXWZmJkaMGIGff/4Z1tbWUpdDRFStOJiCiCqrVgS7CRMmoHfv3ujWrdt/7puXl4f09HTtkpHBv3aJqHbhYAoiqiydD3br1q3DuXPnMG/evCfaf968ebC0tNQuvr6+1VwhEVHV42AKIqoMnQ52sbGxmDJlCtasWQMjI6Mnes3777+PtLQ07XL58uVqrpKIqHpwMAURVZROB7uzZ88iKSkJrVq1glKphFKpxKFDh7BkyRIolUqo1eqHXmNoaAgLCwvtYm5uLkHlRERVg4MpiKgidDrYde3aFeHh4QgNDdUubdq0wYgRIxAaGgqFQiF1iURE1Y6DKYjoSel0sDM3N0ezZs3KLKamprC1tUWzZs2kLo+IqEZwMAURPSmdDnZERCSyMVXhx5EcTEFEj6eUuoCKOnjwoNQlEBFJolk9cTDFlHWh+O7ADTRzsUTP5s5Sl0VEOoQtdkREtciDgymuJnIwBRGVYLAjIqplygym+B8HUxBRCQY7IqJapvRgimgOpiCiUhjsiIhqIQ6mIKLyMNgREdVSxYMpAN6ZgohEDHZERLUYB1MQUWkMdkREtRwHUxBRMQY7IqJajoMpiKgYgx0RkR7gYAoiAhjsiIj0BgdTEBGDHRGRHuFgCqK6jcGOiEjPzOzpg/YNOJiCqC5isCMi0jNKhRzLXuJgCqK6iMGOiEgPcTAFUd3EYEdEpKc4mIKo7mGwIyLSYxxMQVS3MNgREek5DqYgqjsY7IiI9BwHUxDVHQx2RER1AAdTENUNDHZERHXEg4MpvjtwHVl5hRJXRURVicGOiKgOKT2Y4qvdkWj/5b9YtPcqkrPyJa6MiKoCgx0RUR3zQa8mWDDYD152pkjLKcCS/dcQ9OW/mPvPJdxJzZG6PCJ6Cgx2RER1jFwuw7C2btg3rTO+H9EKzepZIKdAjRUh0ei04ADe3RCG60mZUpdJRJWglLoAIiKShkIuQ6/mzujZzAlHr9/D9wdu4PjN+9h4Ng5/nYtDd19HjO/SEP5uVlKXSkRPiMGOiKiOk8lk6Ohtj47e9jgfk4LlB29gz+VE7L4kLkENbTGuc0MENbSFTCaTulwiegwGOyIi0mrpbo2fXmmDa4kZ+OHQTfwdehsh1+8j5Pp9+LlaYlznBghu6gS5nAGPSBfxGjsiInqIt6M5vh7mj4PTu2B0e08YGchxIS4N49acQ7dvDuHP07HIL9RIXSYRPUAmCIJeTz8eFxcHNzc3xMbGwtXVVepyiIhqpfuZeVh5LBqrjkUjPVec+87Jwgivd/TC8AB3mBqyA4ioulQkyzDYERHRE8vMK8Tak7fwy5EoJGXkAQCsTAwwur0nRgV6wtpUJXGFRPqHwa4UBjsioqqXV6jGpnO38eOhG4i+nw0AMFEpMDzAHa939IKzpbHEFRLpj4pkGV5jR0REFWaoFEPc/ne6YNlLLeHrbIHsfDV+PRqFTgsO4L2NYbhxl3PhEdU0XhRBRESVppDL0MfPBb2bO+PwtXv4/sB1nIxKxp9n4rDhbBx6NHXCuC4N4OdqJXWpRHUCgx0RET01mUyGzo3s0bmRPc7eEufC2xeRiJ0XE7DzYgI6NLTD+C4NENiAc+ERVScGOyIiqlKtPazxy6g2iEzIwI+HbuDvsDs4ev0ejl6/B383K3RsaAcrEwNYmahgXfTVysQA1iYqWBobQME58ogqjcGOiIiqRWMncyx6oQXefr4RfjlyE+tOxyIsNhVhsamPfI1MBlgYGZQJfsWBz9pEBWvToiBY9NzKxADWpiqYqhRsCSQCgx0REVUzNxsTzO3fDJO6emPTuTjcTslBSnYBUnMKkJqdj5TsfKRmFSAjrxCCAKTlFCAtpwC3ikbbPgkDhQyWxqWCoInBA49VaN/AFh62ptX4SYmkx2BHREQ1ws7MEGM7NXjk9gK1BmnasFeA1OwCMfRpn+eXWid+TckuQH6hBgVqAfcy83AvM++Rx1cp5Xi3eyO81qE+u3tJbzHYERGRTjBQyGFnZgg7M8MKvS4nX10U8sTA93AgLMDNe5k4H5OKL3Zcwd7LiVg41J+td6SXGOyIiKhWM1YpYKwyhovVoydFFgQB60/H4tNtl3E6OgU9Fh/BB72b4OV27rw2j/QKJygmIiK9J5PJ8GKAO3ZN7YRn6tsgp0CNWVsu4pXfTiE+LUfq8oiqDIMdERHVGW42Jlj7+jOY3ccXhko5jly7h+7fHMamc3HQ8ztsUh3BYEdERHWKXC7Dqx28sGNKR7Rws0JGbiGm/RmGN38/+9jBF0S1AYMdERHVSQ3szbDxrUBMD24MA4UMey4novs3h7HrYrzUpRFVGoMdERHVWUqFHBOebYi/J3SAj5M5krPy8dbqc5i67jzSsgukLo+owhjsiIiozvN1scDfE4Mw4dkGkMuALaF30H3xIRyMTJK6NKIKYbAjIiICYKhUYHqwD/4a1x717UyRmJ6H0StO44PN4cjMK5S6PKInwmBHRERUSkt3a2yf3BFjgjwBAGtPxqDnt4dx8uZ9aQsjegIMdkRERA8wVinwcd+mWPtGO9SzMkZscg5e/PkEPt12GbkFaqnLI3okBjsiIqJHaN/ADrumdsQLbdwgCMCvR6PQe8kRhMWmSl0aUbkY7IiIiB7D3MgA84f44bfRbWBvbogbd7MwaPkxLNoTifxCjdTlEZXBYEdERPQEnvNxxJ6pndDP3wVqjYAl/17HwO9DcCUhXerSiLQY7IiIiJ6QtakKS4a3xHcvtYK1iQEu3UlHv6UhWH7wBtQa3pKMpMdgR0REVEG9/Zyx++1O6OrjgHy1BvN3XcHQH44h6l6W1KVRHcdgR0REVAkO5kb4ZVQbfDXED+aGSpyLSUXPbw9j1bFoaNh6RxJhsCMiIqokmUyGoW3csOvtTmjfwBa5BRp8vPUSXv71JG6n5khdHtVBDHZERERPqZ6VMVa/1g6f9G8KIwM5jt24jx7fHMafZ2IhCGy9o5rDYEdERFQF5HIZXgn0xM4pndDK3QoZeYV4b+MFvLbqDOJSsqUuj+oIBjsiIqIq5GVnig1vtceMHj5QKeT490oSnl90GD8cuoECNee9o+rFYEdERFTFFHIZxnVpgO2TOyDAywY5BWp8ufMKei85glNRyVKXR3qMwY6IiKiaeDuaY/3YZ7BwqD9sTFW4mpiJYT8ex3sbw5CclS91eaSHGOyIiIiqkUwmw5DWrtg/rTOGB7gBAP48E4euXx/En6djOTUKVSkGOyIiohpgbarCvEF++GtcIHyczJGSXYD3/rqAF346jsiEDKnLIz2h08Fu3rx5aNu2LczNzeHg4IABAwYgMjJS6rKIiIgqrbWHDf6Z1AEf9moCE5UCp6NT0HvJEczbGYHs/EKpy6NaTqeD3aFDhzBhwgScOHECe/fuRUFBAbp3746sLN6yhYiIai8DhRxvdKqPvdM6o7uvIwo1An48dBPPLzqMfZcTpS6PajGZUItmTrx79y4cHBxw6NAhdOrU6YleExcXBzc3N8TGxsLV1bWaKyQiIqq4fZcT8fHWS9q7VXT3dcTH/ZqinpWxxJWRLqhIltHpFrsHpaWlAQBsbGwkroSIiKjqdPN1xN5pnfBW5wZQymXYczkRzy86hJ8Oc+47qphaE+w0Gg2mTp2KoKAgNGvW7JH75eXlIT09XbtkZPCCVCIi0n0mKiVm9vTB9skd0dbTGtn5anyx4wr6Lj2Ks7c49x09mVoT7CZMmICLFy9i3bp1j91v3rx5sLS01C6+vr41VCEREdHTa+xkjvVjA7FgiB+sTQxwJSEDg5cfx8y/LiCFc9/Rf6gV19hNnDgRf//9Nw4fPgwvL6/H7puXl4e8vDzt89u3b8PX15fX2BERUa2TnJWPL3dG4M8zcQAAG1MVPujVBINb1YNMJpO4OqopenONnSAImDhxIjZv3ox///33P0MdABgaGsLCwkK7mJub10ClREREVc/GVIUFQ/yx4a1ANHI0Q3JWPt7dEIYXfzqBa4m81IgeptPBbsKECVi9ejXWrl0Lc3NzJCQkICEhATk5OVKXRkREVGPaetpg++SOmNnTB8YGCpyMSkbPb49gwa4ryMlXS10e6RCd7op9VDPzihUrMHr06Cc6Bqc7ISIifRKXko05Wy9hX0QSAMDV2hif9G+K53wcJa6MqktFsoyyhmqqFB3OnERERJJwtTbBL6PaYs+lBMzZeglxKTl4deUZ9GjqhI/7+cLZknPf1WU63RVLRERE5eve1Al7p3XGm53qQyGXYdelBHT7+hB+OXIThZz7rs5isCMiIqqlTA2VeL9XE2yb1AGtPayRla/GZ9sj0HdZCE5FJbPnqw5isCMiIqrlmjhbYMObgfhyUHNYGhsgIj4dw348ju7fHMaPh24gKT1X6hKphuj04ImqwMETRERUl9zPzMNXuyOx+fxt5BWKXbJyGdDR2x6DW7uiu68jjAwUEldJFVGRLMNgR0REpIfScgqwIzweG8/G4eytFO16cyMl+vi5YEhrV7Ryt+JEx7UAg10pDHZERFTXRd3LwqZzcfjrbBzupJV0y3rZmWJwq3oY2MoV9aw4mlZXMdiVwmBHREQk0mgEnLh5HxvPxWFneAJyCsTJjWUyoH0DWwxu5YoezZxgotLp2dDqHAa7UhjsiIiIHpaZV4id4fH461wcTtxM1q43VSnQq7kzBrd2RYCnDeRydtVKTW8mKCYiIqLqYWaoxNA2bhjaxg2xydnYdO42/joXh5jkbGw4G4cNZ+PgZmOMQS1dMbiVK9xtTaQumZ4AW+yIiIgIgHjHp9PRKfjrbBy2h8cjM69Quy3AywZDWrmil58zzAzZLlST2BVbCoMdERFRxeXkq7H7UgL+OheHo9fvoTgtGBnI0bOZMwa3ckVgA1so2FVb7dgVS0RERE/FWKXAgJb1MKBlPdxJzcHm82JX7c27Wdh8/jY2n78NZ0sjDGpVD4NbuaK+vZnUJRPYYkdERERPSBAEnI9NxV9n4/BP2B2k55Z01bZ0t8KQ1q7o4+cCS2MDCavUP+yKLYXBjoiIqOrlFqixLyIRf52Nw6Grd6EpShOGSjl6N3fG8HbuaONhzQmQqwC7YomIiKhaGRko0MfPBX38XJCUnostobfx19nbiEzMwKbzt7Hp/G00dDDD8AB3DGpZD9amKqlLrhPYYkdERERVQhAEhMWl4Y+TMdgadkc7AbJKKUfPZk4YHuCOdl42bMWrIHbFlsJgR0REVPMycgvwd+gdrD0Zg8vx6dr19e1NMbytOwa3doUNW/GeCINdKQx2RERE0hEEAeG30/DHqRj8HXoH2flFrXgKOYKbOWF4gBsC69uyFe8xGOxKYbAjIiLSDZl5hfgn7A7+OBWDC3Fp2vVedqZ4sa0bBrd2hZ2ZoYQV6iYGu1IY7IiIiHTPxVKteMV3uDBQyNC9qRNeCnBHYH1b3qe2CINdKQx2REREuisrrxDbLtzB2lOxCItN1a53tzHBiwFuGNLaFQ7mRtIVqAMY7EphsCMiIqodLt9Jx7rTMdh87jYyilrxlHIZnvd1xPAAd3RoaFcnW/EY7EphsCMiIqpdsvMLsf1CPP44FYNzMana9a7Wxhge4I6hrV3hYFF3WvEY7EphsCMiIqq9riSkY92pWPx1Lg4ZRbcwU8hl6OrjgOHt3NHJ2x4KPW/FY7ArhcGOiIio9svJV2NHuNiKd+ZWinZ9PStjvNDWDcPauMHJUj9b8RjsSmGwIyIi0i/XEjPwR1ErXlpOAQCxFa9HMye8GuSJVu76dY9aBrtSGOyIiIj0U26BGrsuJmDtyRicik7Wrvd3tcSYIC/0au4MlVIuYYVVg8GuFAY7IiIi/RcRn44VIVHYEnoH+YUaAICDuSFGPuOBl9q5w7YWT3zMYFcKgx0REVHdcT8zD3+cisH/jt9CUkYeAECllGNACxeMCfJCE2cLiSusOAa7UhjsiIiI6p78Qg12XozHr0ejyty+LLC+LV7t4IXnfBxqzWjaimQZZQ3VRERERFRjVEo5+reoh37+LjgXk4LfQqKx62ICjt+8j+M378PdxgSj23tiaBtXmBsZSF1ulWGwIyIiIr0lk8nQ2sMGrT1scCc1B/87fgt/nIpBTHI2Ptl2GYv2XsXQNq4Y3d4THramUpf71NgVS0RERHVKdn4hNp+/jRUh0bielAkAkMmArj4OeDXIC4ENbHVquhR2xRIRERE9golKiRHtPPBSgDuOXLuH30KicDDyLvZFJGFfRBJ8nMwxJsgT/VvUg5GBQupyK4QtdkRERFTn3bibiZUh0dh4Ng45BWoAgI2pCi8FuGNkoAccJbw3LUfFlsJgR0RERE8qLbsA68/EYNWxW7idmgMAUMpl6O3njDFBXmjhZlXjNTHYlcJgR0RERBVVqNZg7+VErAiJLnNXi1buVhgT5IUezZxgoKiZu1rwGjsiIiKip6BUyNGzuTN6NnfGxdtp+C0kCv+E3cG5mFScizkPZ0sjjAz0wPC27rA2VUldrlbtv4EaERERUTVqVs8Si4a1QMjM5zClqzfszFSIT8vFgl2RCPxyP346fEPqErUY7IiIiIiegIO5Ed5+vhFCZj6HhUP94etsgdwCjaQDKx7ErlgiIiKiCjBUKjCktSsGt6qH09EpkgyoeBQGOyIiIqJKkMlkCPCykbqMMtgVS0RERKQnGOyIiIiI9ASDHREREZGeYLAjIiIi0hMMdkRERER6gsGOiIiISE8w2BERERHpCQY7IiIiIj3BYEdERESkJxjsiIiIiPQEgx0RERGRnmCwIyIiItITDHZEREREekIpdQHVTaPRAADi4+MlroSIiIio4oozTHGmeRy9D3aJiYkAgICAAIkrISIiIqq8xMREuLu7P3YfmSAIQg3VI4nCwkKcP38ejo6OkMurp+c5IyMDvr6+uHz5MszNzavlPahyeG50E8+L7uK50U08L7qpps6LRqNBYmIiWrZsCaXy8W1yeh/sakJ6ejosLS2RlpYGCwsLqcuhUnhudBPPi+7iudFNPC+6SRfPCwdPEBEREekJBjsiIiIiPcFgVwUMDQ3x8ccfw9DQUOpS6AE8N7qJ50V38dzoJp4X3aSL54XX2BERERHpCbbYEREREekJBjsiIiIiPcFgR0RERKQnGOyqwHfffQdPT08YGRmhXbt2OHXqlNQl1Xnz5s1D27ZtYW5uDgcHBwwYMACRkZFSl0UP+PLLLyGTyTB16lSpS6nzbt++jZdffhm2trYwNjZG8+bNcebMGanLqvPUajVmzZoFLy8vGBsbo0GDBvj000/By+Nr1uHDh9G3b1+4uLhAJpNhy5YtZbYLgoDZs2fD2dkZxsbG6NatG65duyZJrQx2T2n9+vWYNm0aPv74Y5w7dw7+/v4IDg5GUlKS1KXVaYcOHcKECRNw4sQJ7N27FwUFBejevTuysrKkLo2KnD59Gj/++CP8/PykLqXOS0lJQVBQEAwMDLBz505cvnwZX3/9NaytraUurc6bP38+li9fjmXLliEiIgLz58/HggULsHTpUqlLq1OysrLg7++P7777rtztCxYswJIlS/DDDz/g5MmTMDU1RXBwMHJzc2u4Uo6KfWrt2rVD27ZtsWzZMgDibT/c3NwwadIkzJw5U+LqqNjdu3fh4OCAQ4cOoVOnTlKXU+dlZmaiVatW+P777/HZZ5+hRYsWWLx4sdRl1VkzZ85ESEgIjhw5InUp9IA+ffrA0dERv/76q3bd4MGDYWxsjNWrV0tYWd0lk8mwefNmDBgwAIDYWufi4oJ33nkH7777LgAgLS0Njo6OWLlyJV588cUarY8tdk8hPz8fZ8+eRbdu3bTr5HI5unXrhuPHj0tYGT0oLS0NAGBjYyNxJQQAEyZMQO/evcv87pB0tm7dijZt2mDo0KFwcHBAy5Yt8fPPP0tdFgFo37499u/fj6tXrwIAwsLCcPToUfTs2VPiyqhYVFQUEhISyvx7ZmlpiXbt2kmSBR5/J1l6rHv37kGtVsPR0bHMekdHR1y5ckWiquhBGo0GU6dORVBQEJo1ayZ1OXXeunXrcO7cOZw+fVrqUqjIzZs3sXz5ckybNg0ffPABTp8+jcmTJ0OlUmHUqFFSl1enzZw5E+np6fDx8YFCoYBarcbnn3+OESNGSF0aFUlISACAcrNA8baaxGBHem/ChAm4ePEijh49KnUpdV5sbCymTJmCvXv3wsjISOpyqIhGo0GbNm3wxRdfAABatmyJixcv4ocffmCwk9iff/6JNWvWYO3atWjatClCQ0MxdepUuLi48NxQudgV+xTs7OygUCiQmJhYZn1iYiKcnJwkqopKmzhxIrZt24YDBw7A1dVV6nLqvLNnzyIpKQmtWrWCUqmEUqnEoUOHsGTJEiiVSqjVaqlLrJOcnZ3h6+tbZl2TJk0QExMjUUVUbPr06Zg5cyZefPFFNG/eHCNHjsTbb7+NefPmSV0aFSn+/15XsgCD3VNQqVRo3bo19u/fr12n0Wiwf/9+BAYGSlgZCYKAiRMnYvPmzfj333/h5eUldUkEoGvXrggPD0doaKh2adOmDUaMGIHQ0FAoFAqpS6yTgoKCHpoO6OrVq/Dw8JCoIiqWnZ0Nubzsf9UKhQIajUaiiuhBXl5ecHJyKpMF0tPTcfLkSUmyALtin9K0adMwatQotGnTBgEBAVi8eDGysrIwZswYqUur0yZMmIC1a9fi77//hrm5ufY6B0tLSxgbG0tcXd1lbm7+0HWOpqamsLW15fWPEnr77bfRvn17fPHFFxg2bBhOnTqFn376CT/99JPUpdV5ffv2xeeffw53d3c0bdoU58+fx6JFi/Dqq69KXVqdkpmZievXr2ufR0VFITQ0FDY2NnB3d8fUqVPx2WefwdvbG15eXpg1axZcXFy0I2drlEBPbenSpYK7u7ugUqmEgIAA4cSJE1KXVOcBKHdZsWKF1KXRAzp37ixMmTJF6jLqvH/++Udo1qyZYGhoKPj4+Ag//fST1CWRIAjp6enClClTBHd3d8HIyEioX7++8OGHHwp5eXlSl1anHDhwoNz/U0aNGiUIgiBoNBph1qxZgqOjo2BoaCh07dpViIyMlKRWzmNHREREpCd4jR0RERGRnmCwIyIiItITDHZEREREeoLBjoiIiEhPMNgRERER6QkGOyIiIiI9wWBHREREpCcY7IiIiIj0BIMdEVENkclk2LJli9RlEJEeY7Ajojph9OjRkMlkDy09evSQujQioiqjlLoAIqKa0qNHD6xYsaLMOkNDQ4mqISKqemyxI6I6w9DQEE5OTmUWa2trAGI36fLly9GzZ08YGxujfv362LhxY5nXh4eH47nnnoOxsTFsbW0xduxYZGZmltnnt99+Q9OmTWFoaAhnZ2dMnDixzPZ79+5h4MCBMDExgbe3N7Zu3ardlpKSghEjRsDe3h7Gxsbw9vZ+KIgSET0Ogx0RUZFZs2Zh8ODBCAsLw4gRI/Diiy8iIiICAJCVlYXg4GBYW1vj9OnT2LBhA/bt21cmuC1fvhwTJkzA2LFjER4ejq1bt6Jhw4Zl3mPu3LkYNmwYLly4gF69emHEiBFITk7Wvv/ly5exc+dOREREYPny5bCzs6u5bwAR1X4CEVEdMGrUKEGhUAimpqZlls8//1wQBEEAILz11ltlXtOuXTth3LhxgiAIwk8//SRYW1sLmZmZ2u3bt28X5HK5kJCQIAiCILi4uAgffvjhI2sAIHz00Ufa55mZmQIAYefOnYIgCELfvn2FMWPGVM0HJqI6idfYEVGd8eyzz2L58uVl1tnY2GgfBwYGltkWGBiI0NBQAEBERAT8/f1hamqq3R4UFASNRoPIyEjIZDLcuXMHXbt2fWwNfn5+2sempqawsLBAUlISAGDcuHEYPHgwzp07h+7du2PAgAFo3759pT4rEdVNDHZEVGeYmpo+1DVaVYyNjZ9oPwMDgzLPZTIZNBoNAKBnz564desWduzYgb1796Jr166YMGECFi5cWOX1EpF+4jV2RERFTpw48dDzJk2aAACaNGmCsLAwZGVlabeHhIRALpejcePGMDc3h6enJ/bv3/9UNdjb22PUqFFYvXo1Fi9ejJ9++umpjkdEdQtb7IiozsjLy0NCQkKZdUqlUjtAYcOGDWjTpg06dOiANWvW4NSpU/j1118BACNGjMDHH3+MUaNGYc6cObh79y4mTZqEkSNHwtHREQAwZ84cvPXWW3BwcEDPnj2RkZGBkJAQTJo06Ynqmz17Nlq3bo2mTZsiLy8P27Zt0wZLIqInwWBHRHXGrl274OzsXGZd48aNceXKFQDiiNV169Zh/PjxcHZ2xh9//AFfX18AgImJCXbv3o0pU6agbdu2MDExweDBg7Fo0SLtsUaNGoXc3Fx88803ePfdd2FnZ4chQ4Y8cX0qlQrvv/8+oqOjYWxsjI4dO2LdunVV8MmJqK6QCYIgSF0EEZHUZDIZNm/ejAEDBkhdChFRpfEaOyIiIiI9wWBHREREpCd4jR0REQBelUJE+oAtdkRERER6gsGOiIiISE8w2BERERHpCQY7IiIiIj3BYEdERESkJxjsiIiIiPQEgx0RERGRnmCwIyIiItITDHZEREREeuL/3w0XRBJ53asAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 640x480 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):\n",
+    "    fig, ax1 = plt.subplots()\n",
+    "\n",
+    "    # Plot training and validation loss against epochs\n",
+    "    ax1.plot(epochs_seen, train_losses, label=\"Training loss\")\n",
+    "    ax1.plot(epochs_seen, val_losses, linestyle=\"-.\", label=\"Validation loss\")\n",
+    "    ax1.set_xlabel(\"Epochs\")\n",
+    "    ax1.set_ylabel(\"Loss\")\n",
+    "    ax1.legend(loc=\"upper right\")\n",
+    "\n",
+    "    # Create a second x-axis for tokens seen\n",
+    "    ax2 = ax1.twiny()  # Create a second x-axis that shares the same y-axis\n",
+    "    ax2.plot(tokens_seen, train_losses, alpha=0)  # Invisible plot for aligning ticks\n",
+    "    ax2.set_xlabel(\"Tokens seen\")\n",
+    "\n",
+    "    fig.tight_layout()  # Adjust layout to make room\n",
+    "    plt.savefig(\"loss-plot.pdf\")\n",
+    "    plt.show()\n",
+    "\n",
+    "epochs_tensor = torch.linspace(0, num_epochs, len(train_losses))\n",
+    "plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8bc83ded-5f80-4e1c-bf4d-ccb59999d995",
+   "metadata": {},
+   "source": [
+    "- 从上面的结果来看，我们可以看到模型开始生成无法理解的单词串，而到了后期，它能够产生语法上或多或少正确的句子。\n",
+    "- 然而，根据训练集和验证集的损失情况，我们可以看到模型开始过拟合。\n",
+    "- 如果我们检查它在训练结束时写的几段文本，我们会发现自己它们与训练集中的内容几乎一字不差——它只是简单地记住了训练数据。\n",
+    "- 稍后，我们将讨论一些解码策略，可以在一定程度上减轻这种记忆现象。\n",
+    "- 请注意，这里的过拟合是因为我们有一个非常非常小的训练集，而且我们多次迭代它。\n",
+    "  - 这里的大型语言模型训练主要是出于教育目的；我们主要想看到模型能够学习产生连贯的文本。\n",
+    "  - 我们不会花费数周或数月的时间在昂贵的硬件上训练这个模型，我们将在后续加载预训练的权重。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb380c42-b31c-4ee1-b8b9-244094537272",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-2.webp\" width=350px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de713235-1561-467f-bf63-bf11ade383f0",
+   "metadata": {},
+   "source": [
+    "**如果您对使用更先进的技术来增强这个训练函数感兴趣，例如学习率预热、余弦退火和梯度裁剪，请参考[附录D](../../appendix-D/03_main-chapter-code)。**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d5cdf2f-09a5-4eb0-a20a-d7aac5c14c2c",
+   "metadata": {},
+   "source": [
+    "**如果您对更大的训练数据集和更长时间的训练感兴趣，请查看 [../03_bonus_pretraining_on_gutenberg](../03_bonus_pretraining_on_gutenberg)**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "699f45fc-bf78-42f2-bd24-2355db41b28f",
+   "metadata": {
+    "id": "699f45fc-bf78-42f2-bd24-2355db41b28f"
+   },
+   "source": [
+    "## 5.3 解码策略以控制随机性"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6be9086e-2c27-41da-97d0-49137d0ba3c7",
+   "metadata": {},
+   "source": [
+    "- 使用相对较小的大型语言模型（如我们上面训练的GPT模型），推理过程相对廉价，因此如果您在上面使用GPU进行了训练，那么在推理时就不需要使用GPU。\n",
+    "- 使用我们之前在简单训练函数中使用的`generate_text_simple`函数（来自上一章），我们可以一次生成一个单词（或标记）的新文本。\n",
+    "- 如5.1.2节所解释的，下一个生成的标记是词汇表中所有标记中对应最大概率分数的标记。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "2734cee0-f6f9-42d5-b71c-fa7e0ef28b6d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Output text:\n",
+      " Every effort moves you?\"\n",
+      "\n",
+      "\"Yes--quite insensible to the irony. She wanted him vindicated--and by me!\"\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "model.to(\"cpu\")\n",
+    "model.eval()\n",
+    "\n",
+    "tokenizer = tiktoken.get_encoding(\"gpt2\")\n",
+    "\n",
+    "token_ids = generate_text_simple(\n",
+    "    model=model,\n",
+    "    idx=text_to_token_ids(\"Every effort moves you\", tokenizer),\n",
+    "    max_new_tokens=25,\n",
+    "    context_size=GPT_CONFIG_124M[\"ctx_len\"]\n",
+    ")\n",
+    "\n",
+    "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d25dbe31-bb7c-4893-b25b-47d0492d4aa4",
+   "metadata": {},
+   "source": [
+    "- 即使我们多次执行上面的`generate_text_simple`函数，大型语言模型（LLM）始终会生成相同的输出。\n",
+    "- 现在我们引入两个概念，所谓的解码策略，来修改`generate_text_simple`：*温度缩放*和*top-k*采样。\n",
+    "- 这将允许模型控制生成文本的随机性和多样性。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4bb6f380-a798-4fd9-825c-17b7cd29a994",
+   "metadata": {},
+   "source": [
+    "### 5.3.1 温度缩放"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7f4f53c-0612-43d3-aa82-52447eac50fa",
+   "metadata": {},
+   "source": [
+    "- 之前，我们总是使用`torch.argmax`采样最大概率的标记作为下一个标记。\n",
+    "- 为了增加多样性，我们可以使用`torch.multinomial(probs, num_samples=1)`从概率分布中采样下一个标记。\n",
+    "- 在这里，每个索引被选中的机会与其在输入张量中的概率相对应。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7531bae-d5de-44c0-bc78-78fed077e22a",
+   "metadata": {},
+   "source": [
+    "- 这里是一个关于生成下一个标记的小回顾，假设一个非常小的词汇表，仅用于说明目的："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "01a5ce39-3dc8-4c35-96bc-6410a1e42412",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "forward\n"
+     ]
+    }
+   ],
+   "source": [
+    "vocab = { \n",
+    "    \"closer\": 0,\n",
+    "    \"every\": 1, \n",
+    "    \"effort\": 2, \n",
+    "    \"forward\": 3,\n",
+    "    \"inches\": 4,\n",
+    "    \"moves\": 5, \n",
+    "    \"pizza\": 6,\n",
+    "    \"toward\": 7,\n",
+    "    \"you\": 8,\n",
+    "} \n",
+    "\n",
+    "inverse_vocab = {v: k for k, v in vocab.items()}\n",
+    "\n",
+    "# Suppose input is \"every effort moves you\", and the LLM\n",
+    "# returns the following logits for the next token:\n",
+    "next_token_logits = torch.tensor(\n",
+    "    [4.51, 0.89, -1.90, 6.75, 1.63, -1.62, -1.89, 6.28, 1.79]\n",
+    ")\n",
+    "\n",
+    "probas = torch.softmax(next_token_logits, dim=0)\n",
+    "next_token_id = torch.argmax(probas).item()\n",
+    "\n",
+    "# The next generated token is then as follows:\n",
+    "print(inverse_vocab[next_token_id])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "6400572f-b3c8-49e2-95bc-433e55c5b3a1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "forward\n"
+     ]
+    }
+   ],
+   "source": [
+    "torch.manual_seed(123)\n",
+    "next_token_id = torch.multinomial(probas, num_samples=1).item()\n",
+    "print(inverse_vocab[next_token_id])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "b23b863e-252a-403c-b5b1-62bc0a42319f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "73 x closer\n",
+      "0 x every\n",
+      "0 x effort\n",
+      "582 x forward\n",
+      "2 x inches\n",
+      "0 x moves\n",
+      "0 x pizza\n",
+      "343 x toward\n"
+     ]
+    }
+   ],
+   "source": [
+    "def print_sampled_tokens(probas):\n",
+    "    torch.manual_seed(123) # Manual seed for reproducibility\n",
+    "    sample = [torch.multinomial(probas, num_samples=1).item() for i in range(1_000)]\n",
+    "    sampled_ids = torch.bincount(torch.tensor(sample))\n",
+    "    for i, freq in enumerate(sampled_ids):\n",
+    "        print(f\"{freq} x {inverse_vocab[i]}\")\n",
+    "\n",
+    "print_sampled_tokens(probas)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c63d0a27-830b-42b5-9986-6d1a7de04dd9",
+   "metadata": {},
+   "source": [
+    "- 我们不是通过`torch.argmax`来确定最可能的标记，而是使用`torch.multinomial(probas, num_samples=1)`从softmax分布中采样来确定最可能的标记。\n",
+    "- 为了说明，让我们看看当我们使用原始的softmax概率采样1000次下一个标记时会发生什么："
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32e7d9cf-a26d-4d9a-8664-4af1efa73832",
+   "metadata": {},
+   "source": [
+    "- 我们可以通过一个称为温度缩放的概念来控制分布和选择过程。\n",
+    "- “温度缩放”只是将logits除以一个大于0的数字的高级说法。\n",
+    "- 大于1的温度值将在应用softmax后导致更均匀分布的标记概率。\n",
+    "- 小于1的温度值将在应用softmax后导致更自信（更尖锐或更高峰）的分布。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "0759e4c8-5362-467c-bec6-b0a19d1ba43d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def softmax_with_temperature(logits, temperature):\n",
+    "    scaled_logits = logits / temperature\n",
+    "    return torch.softmax(scaled_logits, dim=0)\n",
+    "\n",
+    "# Temperature values\n",
+    "temperatures = [1, 0.1, 5]  # Original, higher confidence, and lower confidence\n",
+    "\n",
+    "# Calculate scaled probabilities\n",
+    "scaled_probas = [softmax_with_temperature(next_token_logits, T) for T in temperatures]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "2e66e613-4aca-4296-a984-ddd0d80c6578",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnYAAAHWCAYAAAD6oMSKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABVBElEQVR4nO3deVxUdcM+/mtAGEA2jU0UBdQSElnVsBQtCtTbJW+XUFMReR5LXCA0LRaFBG9LU79imFvuS4Zm7koirrkvJUKICLcB7hAgssz5/eHPeRpBZZnhwOF6v17zivnMOTPXAJMXn7PJBEEQQERERESNnpbYAYiIiIhIPVjsiIiIiCSCxY6IiIhIIljsiIiIiCSCxY6IiIhIIljsiIiIiCSCxY6IiIhIIljsiIiIiCSimdgB6ptCocBff/0FIyMjyGQyseMQERERvZQgCPj7779hbW0NLa2Xz8k1uWL3119/wcbGRuwYRERERDWSnZ2NNm3avHSZJlfsjIyMADz95hgbG4uchoiIiOjlCgoKYGNjo+wwL9Pkit2zza/GxsYsdkRERNRoVGcXMh48QURERCQRLHZEREREEsFiR0RERCQRTW4fOyIikhaFQoHS0lKxYxDVmo6ODrS1tdXyXCx2RETUaJWWluLmzZtQKBRiRyGqE1NTU1hZWdX5HLssdkRE1CgJgoCcnBxoa2vDxsbmlSduJWqIBEFAcXEx7ty5AwBo1apVnZ6PxY6IiBql8vJyFBcXw9raGgYGBmLHIao1fX19AMCdO3dgYWFRp82y/POGiIgapYqKCgCArq6uyEmI6u7ZHydlZWV1eh5Ri11ycjIGDBgAa2tryGQy7Ny585XrJCUlwc3NDXK5HB06dMAPP/yg8ZxERNRw8brfJAXq+j0WtdgVFRXB2dkZcXFx1Vr+5s2b6N+/P/r06YNLly5h2rRpmDBhAg4cOKDhpEREREQNn6jFrm/fvvjqq6/w4YcfVmv5+Ph42NnZYcGCBXBwcEBQUBCGDh2Kb7/9VsNJiYiI6k4mk730Nnv2bLEjqp2trS0WLVokdow6mTJlCtzd3SGXy+Hi4iJ2nJdqVAdPnDp1Ct7e3ipjPj4+mDZt2gvXefLkCZ48eaK8X1BQoKl4RETUANjO3FOvr5c5r3+1l83JyVF+vXXrVkRERCA1NVU5ZmhoqNZsmiIIAioqKtCsWf3ViNLSUlH3pxw/fjx+++03XLlyRbQM1dGoDp7Izc2FpaWlypilpSUKCgrw+PHjKteJjY2FiYmJ8mZjY1MfUYmIiCqxsrJS3kxMTCCTyVTGtmzZAgcHB+jp6aFTp05YtmyZct3MzEzIZDJs27YNPXv2hL6+Prp27Yq0tDScPXsWHh4eMDQ0RN++fXH37l3leuPGjcPgwYMxZ84cmJubw9jYGBMnTlQ5qbNCoUBsbCzs7Oygr68PZ2dnbN++Xfl4UlISZDIZ9u3bp5y5On78OG7cuIFBgwbB0tIShoaG6Nq1Kw4fPqxcr3fv3rh16xaCg4OVs5IAMHv27EozX4sWLYKtrW2l3HPnzoW1tTXeeOMNAEB2djaGDx8OU1NTtGzZEoMGDUJmZqY6fjwvtGTJEkyaNAn29vYafR11aFTFrjZmzZqF/Px85S07O1vsSERERJVs3LgRERERmDt3LlJSUhATE4Pw8HCsXbtWZbnIyEiEhYXhwoULaNasGUaOHIkZM2Zg8eLFOHbsGNLT0xEREaGyTmJiIlJSUpCUlITNmzcjISEBc+bMUT4eGxuLdevWIT4+Hn/88QeCg4MxevRoHD16VOV5Zs6ciXnz5iElJQVdunRBYWEh+vXrh8TERFy8eBG+vr4YMGAAsrKyAAAJCQlo06YNoqKikJOTozJjWR2JiYlITU3FoUOHsHv3bpSVlcHHxwdGRkY4duwYTpw4AUNDQ/j6+r706iOGhoYvvU2cOLFGuRqyRrUp1srKCnl5eSpjeXl5MDY2Vp4D5nlyuRxyubw+4hEREdVaZGQkFixYgCFDhgAA7OzscO3aNSxfvhxjx45VLhcaGgofHx8AwNSpU+Hn54fExES8/fbbAICAgIBKZ4zQ1dXF6tWrYWBggDfffBNRUVGYPn06oqOjUVZWhpiYGBw+fBienp4AAHt7exw/fhzLly+Hl5eX8nmioqLw/vvvK++3bNkSzs7OyvvR0dHYsWMHdu3ahaCgILRs2RLa2towMjKClZVVjb8nzZs3x8qVK5WbYDds2ACFQoGVK1cqZ//WrFkDU1NTJCUl4YMPPqjyeS5duvTS1zE2Nq5xtoaqURU7T09P7N27V2Xs0KFDyl9EIhLZbJNqLJOv+RxEjUxRURFu3LiBgIAABAYGKsfLy8thYqL6uerSpYvy62e7Jzk5OamMPbuKwTPOzs4qJ3H29PREYWEhsrOzUVhYiOLiYpXCBjzdp83V1VVlzMPDQ+V+YWEhZs+ejT179iAnJwfl5eV4/PixcsaurpycnFT2q7t8+TLS09NhZGSkslxJSQlu3Ljxwufp0KGDWvI0BqIWu8LCQqSnpyvv37x5E5cuXULLli3Rtm1bzJo1C7dv38a6desAABMnTsTSpUsxY8YMjB8/Hr/++iu2bduGPXvqd0dZIiIidSosLAQArFixAt27d1d57PmrEOjo6Ci/fjZr9fxYTa6d++y19+zZg9atW6s89vwWr+bNm6vcDw0NxaFDh/DNN9+gQ4cO0NfXx9ChQ1+6WRQAtLS0IAiCylhVJ+Z9/vUKCwvh7u6OjRs3VlrW3Nz8ha/3qoNSRo8ejfj4+Jcu01iIWuzOnTuHPn36KO+HhIQAAMaOHYsffvgBOTk5Kq3fzs4Oe/bsQXBwMBYvXow2bdpg5cqVyilpIiKixsjS0hLW1tbIyMjAqFGj1P78ly9fxuPHj5W7LZ0+fRqGhoawsbFBy5YtIZfLkZWVpbLZtTpOnDiBcePGKU9bVlhYWOlABl1dXeVVQp4xNzdHbm4uBEFQltNXbS4FADc3N2zduhUWFhY12nzKTbH1pHfv3pUa+z9VdVWJ3r174+LFixpMRUREVP/mzJmDKVOmwMTEBL6+vnjy5AnOnTuHhw8fKic+aqu0tBQBAQEICwtDZmYmIiMjERQUBC0tLRgZGSE0NBTBwcFQKBR45513kJ+fjxMnTsDY2Fhl/77ndezYEQkJCRgwYABkMhnCw8MrzRba2toiOTkZH330EeRyOczMzNC7d2/cvXsX8+fPx9ChQ7F//37s27fvlQVr1KhR+PrrrzFo0CBERUWhTZs2uHXrFhISEjBjxgy0adOmyvXquik2PT0dhYWFyM3NxePHj5VF0dHRscFd0k7yR8USERE1BhMmTMDKlSuxZs0aODk5wcvLCz/88APs7Ozq/NzvvfceOnbsiF69emHEiBEYOHCgysmQo6OjER4ejtjYWDg4OMDX1xd79ux55WsvXLgQLVq0QI8ePTBgwAD4+PjAzc1NZZmoqChkZmaiffv2ys2lDg4OWLZsGeLi4uDs7IwzZ84gNDT0le/DwMAAycnJaNu2LYYMGQIHBwcEBASgpKREo7NuEyZMgKurK5YvX460tDS4urrC1dUVf/31l8Zes7ZkwsumzCSooKAAJiYmyM/Pl9TUK1GDwIMnqB6VlJTg5s2bsLOzg56ennK8IZ+gWAzjxo3Do0ePqnU9dhLPi36fgZp1l0Z1VCwREdGrNPSiRaRJ3BRLREREJBGcsSMiIpKwqg5EJOnijB0RERGRRLDYEREREUkEix0RERGRRLDYEREREUkEix0RERGRRLDYEREREUkEix0RERGRRLDYERER1ROZTPbS2z+v3yoVtra2WLRokdgx6iQrKwv9+/eHgYEBLCwsMH36dJSXl790nblz56JHjx4wMDCAqalp/QQFT1BMRERSU51rFqv19ap//eOcnBzl11u3bkVERARSU1OVY4aGhmqNpimCIKCiogLNmtVfjSgtLYWurm69vd4zFRUV6N+/P6ysrHDy5Enk5ORgzJgx0NHRQUxMzAvXKy0txbBhw+Dp6YlVq1bVW17O2BEREdUTKysr5c3ExAQymUxlbMuWLXBwcICenh46deqEZcuWKdfNzMyETCbDtm3b0LNnT+jr66Nr165IS0vD2bNn4eHhAUNDQ/Tt2xd3795Vrjdu3DgMHjwYc+bMgbm5OYyNjTFx4kSUlpYql1EoFIiNjYWdnR309fXh7OyM7du3Kx9PSkqCTCbDvn374O7uDrlcjuPHj+PGjRsYNGgQLC0tYWhoiK5du+Lw4cPK9Xr37o1bt24hODhYOSsJALNnz4aLi4vK92bRokWwtbWtlHvu3LmwtrbGG2+8AQDIzs7G8OHDYWpqipYtW2LQoEHIzMxUx4+nSgcPHsS1a9ewYcMGuLi4oG/fvoiOjkZcXJzK9/B5c+bMQXBwMJycnDSWrSosdkRERA3Axo0bERERgblz5yIlJQUxMTEIDw/H2rVrVZaLjIxEWFgYLly4gGbNmmHkyJGYMWMGFi9ejGPHjiE9PR0REREq6yQmJiIlJQVJSUnYvHkzEhISMGfOHOXjsbGxWLduHeLj4/HHH38gODgYo0ePxtGjR1WeZ+bMmZg3bx5SUlLQpUsXFBYWol+/fkhMTMTFixfh6+uLAQMGICsrCwCQkJCANm3aICoqCjk5OSozltWRmJiI1NRUHDp0CLt370ZZWRl8fHxgZGSEY8eO4cSJEzA0NISvr+9LS5ahoeFLbxMnTnzhuqdOnYKTkxMsLS2VYz4+PigoKMAff/xRo/dTH7gploiIqAGIjIzEggULMGTIEACAnZ0drl27huXLl2Ps2LHK5UJDQ+Hj4wMAmDp1Kvz8/JCYmIi3334bABAQEFDp+rC6urpYvXo1DAwM8OabbyIqKgrTp09HdHQ0ysrKEBMTg8OHD8PT0xMAYG9vj+PHj2P58uXw8vJSPk9UVBTef/995f2WLVvC2dlZeT86Oho7duzArl27EBQUhJYtW0JbWxtGRkawsrKq8fekefPmWLlypXIT7IYNG6BQKLBy5Url7N+aNWtgamqKpKQkfPDBB1U+z6VLl176OsbGxi98LDc3V6XUAVDez83Nre5bqTcsdkRERCIrKirCjRs3EBAQgMDAQOV4eXk5TExU9xns0qWL8utnBeOfm/ssLS1x584dlXWcnZ1hYGCgvO/p6YnCwkJkZ2ejsLAQxcXFKoUNeLqPmKurq8qYh4eHyv3CwkLMnj0be/bsQU5ODsrLy/H48WPljF1dOTk5qexXd/nyZaSnp8PIyEhluZKSEty4ceOFz9OhQwe15GkMWOyIiIhEVlhYCABYsWIFunfvrvKYtra2yn0dHR3l189mrZ4fUygUNX7tPXv2oHXr1iqPyeVylfvNmzdXuR8aGopDhw7hm2++QYcOHaCvr4+hQ4e+dLMoAGhpaUEQBJWxsrKySss9/3qFhYVwd3fHxo0bKy1rbm7+wtd71UEpo0ePRnx8fJWPWVlZ4cyZMypjeXl5yscaGhY7IiIikVlaWsLa2hoZGRkYNWqU2p//8uXLePz4MfT19QEAp0+fhqGhIWxsbNCyZUvI5XJkZWWpbHatjhMnTmDcuHH48MMPATwtXs8fyKCrq4uKigqVMXNzc+Tm5kIQBGU5fdXmUgBwc3PD1q1bYWFh8dLNp8+ry6ZYT09PzJ07F3fu3IGFhQUA4NChQzA2Noajo2O1M9QXFjsiIqIGYM6cOZgyZQpMTEzg6+uLJ0+e4Ny5c3j48CFCQkLq9NylpaUICAhAWFgYMjMzERkZiaCgIGhpacHIyAihoaEIDg6GQqHAO++8g/z8fJw4cQLGxsYq+/c9r2PHjkhISMCAAQMgk8kQHh5eabbQ1tYWycnJ+OijjyCXy2FmZobevXvj7t27mD9/PoYOHYr9+/dj3759ryxro0aNwtdff41BgwYhKioKbdq0wa1bt5CQkIAZM2agTZs2Va5Xl02xH3zwARwdHfHxxx9j/vz5yM3NRVhYGCZNmqSc0Txz5gzGjBmDxMRE5axnVlYWHjx4gKysLFRUVCjLZYcOHTR6WhseFUtERNQATJgwAStXrsSaNWvg5OQELy8v/PDDD7Czs6vzc7/33nvo2LEjevXqhREjRmDgwIEqJ0OOjo5GeHg4YmNj4eDgAF9fX+zZs+eVr71w4UK0aNECPXr0wIABA+Dj4wM3NzeVZaKiopCZmYn27dsrN5c6ODhg2bJliIuLg7OzM86cOYPQ0NBXvg8DAwMkJyejbdu2GDJkCBwcHBAQEICSkpIazeDVhLa2Nnbv3g1tbW14enpi9OjRGDNmDKKiopTLFBcXIzU1VWVzckREBFxdXREZGYnCwkK4urrC1dUV586d00jOZ2TC8xu5Ja6goAAmJibIz8/X2C8BUZNVnRPD1uBkrkQvU1JSgps3b8LOzg56enpix2mwxo0bh0ePHmHnzp1iR6GXeNnvc026C2fsiIiIiCSCxY6IiIhIInjwBBERkYQ9f7JikjbO2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERFRPZHJZC+9/fP6rVJha2uLRYsWiR2jTqr6WW3ZskXsWFXiCYqJiEhSnNY61evrXR17tdrL5uTkKL/eunUrIiIikJqaqhwzNDRUazZNEQQBFRUVaNas/mpEaWkpdHV16+31nrdmzRr4+voq75uamoqW5WU4Y0dERFRPrKyslDcTExPIZDKVsS1btsDBwQF6enro1KkTli1bplw3MzMTMpkM27ZtQ8+ePaGvr4+uXbsiLS0NZ8+ehYeHBwwNDdG3b1/cvXtXud64ceMwePBgzJkzB+bm5jA2NsbEiRNRWlqqXEahUCA2NhZ2dnbQ19eHs7Mztm/frnw8KSkJMpkM+/btg7u7O+RyOY4fP44bN25g0KBBsLS0hKGhIbp27YrDhw8r1+vduzdu3bqF4OBg5UwXAMyePRsuLi4q35tFixbB1ta2Uu65c+fC2toab7zxBgAgOzsbw4cPh6mpKVq2bIlBgwYhMzNTHT+elzI1NVX5Wenp6Wn8NWuDxY6IiKgB2LhxIyIiIjB37lykpKQgJiYG4eHhWLt2rcpykZGRCAsLw4ULF9CsWTOMHDkSM2bMwOLFi3Hs2DGkp6cjIiJCZZ3ExESkpKQgKSkJmzdvRkJCAubMmaN8PDY2FuvWrUN8fDz++OMPBAcHY/To0Th69KjK88ycORPz5s1DSkoKunTpgsLCQvTr1w+JiYm4ePEifH19MWDAAGRlZQEAEhIS0KZNG0RFRSEnJ0dlxrI6EhMTkZqaikOHDmH37t0oKyuDj48PjIyMcOzYMZw4cQKGhobw9fVVKarPMzQ0fOlt4sSJr8wyadIkmJmZoVu3bli9ejUEQajRe6kv3BRLRETUAERGRmLBggUYMmQIAMDOzg7Xrl3D8uXLMXbsWOVyoaGh8PHxAQBMnToVfn5+SExMxNtvvw0ACAgIqHR9WF1dXaxevRoGBgZ48803ERUVhenTpyM6OhplZWWIiYnB4cOH4enpCQCwt7fH8ePHsXz5cnh5eSmfJyoqCu+//77yfsuWLeHs7Ky8Hx0djR07dmDXrl0ICgpCy5Ytoa2tDSMjI1hZWdX4e9K8eXOsXLlSuQl2w4YNUCgUWLlypXL2b82aNTA1NUVSUhI++OCDKp/n0qVLL30dY2Pjlz4eFRWFd999FwYGBjh48CA+/fRTFBYWYsqUKTV+T5rGYkdERCSyoqIi3LhxAwEBAQgMDFSOl5eXw8TERGXZLl26KL+2tLQEADg5OamM3blzR2UdZ2dnGBgYKO97enqisLAQ2dnZKCwsRHFxsUphA57u0+bq6qoy5uHhoXK/sLAQs2fPxp49e5CTk4Py8nI8fvxYOWNXV05OTir71V2+fBnp6ekwMjJSWa6kpAQ3btx44fN06NChTjnCw8OVX7u6uqKoqAhff/01ix0RERFVVlhYCABYsWIFunfvrvKYtra2yn0dHR3l189mrZ4fUygUNX7tPXv2oHXr1iqPyeVylfvNmzdXuR8aGopDhw7hm2++QYcOHaCvr4+hQ4e+dLMoAGhpaVXalFlWVlZpuedfr7CwEO7u7ti4cWOlZc3NzV/4eq86KGX06NGIj49/6TL/1L17d0RHR+PJkyeVvkdiY7EjIiISmaWlJaytrZGRkYFRo0ap/fkvX76Mx48fQ19fHwBw+vRpGBoawsbGBi1btoRcLkdWVpbKZtfqOHHiBMaNG4cPP/wQwNPi9fyBDLq6uqioqFAZMzc3R25uLgRBUJbTV20uBQA3Nzds3boVFhYWr9x8+k913RRb1fO1aNGiwZU6gMWOiIioQZgzZw6mTJkCExMT+Pr64smTJzh37hwePnyIkJCQOj13aWkpAgICEBYWhszMTERGRiIoKAhaWlowMjJCaGgogoODoVAo8M477yA/Px8nTpyAsbGxyv59z+vYsSMSEhIwYMAAyGQyhIeHV5ottLW1RXJyMj766CPI5XKYmZmhd+/euHv3LubPn4+hQ4di//792Ldv3ysL1qhRo/D1119j0KBBiIqKQps2bXDr1i0kJCRgxowZaNOmTZXr1WVT7C+//IK8vDy89dZb0NPTw6FDhxATE4PQ0NBaP6cm8ahYIiKiBmDChAlYuXIl1qxZAycnJ3h5eeGHH36AnZ1dnZ/7vffeQ8eOHdGrVy+MGDECAwcOVDkZcnR0NMLDwxEbGwsHBwf4+vpiz549r3zthQsXokWLFujRowcGDBgAHx8fuLm5qSwTFRWFzMxMtG/fXrm51MHBAcuWLUNcXBycnZ1x5syZahUlAwMDJCcno23bthgyZAgcHBwQEBCAkpKSGs+6VZeOjg7i4uLg6ekJFxcXLF++HAsXLkRkZKRGXq+uZEJDPV5XQwoKCmBiYoL8/HyN/RIQNVmzTaqxTL7mc1CTUFJSgps3b8LOzq7BnlOsIRg3bhwePXqEnTt3ih2FXuJlv8816S6csSMiIiKSCBY7IiIiIongwRNEREQS9vzJiknaOGNHREREJBEsdkREREQSwWJHRESNWhM7uQNJlLp+j1nsiIioUXp2qa1XXb6KqDEoLi4GoHp5uNrgwRNERNQoNWvWDAYGBrh79y50dHSgpcW5Cmp8BEFAcXEx7ty5A1NT00rXBq4pFjsiImqUZDIZWrVqhZs3b+LWrVtixyGqE1NTU1hZWdX5eVjsiIio0dLV1UXHjh25OZYaNR0dnTrP1D3DYkdERI2alpYWLylG9P/jDglEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRohe7uLg42NraQk9PD927d8eZM2deuvyiRYvwxhtvQF9fHzY2NggODkZJSUk9pSUiIiJquEQtdlu3bkVISAgiIyNx4cIFODs7w8fHB3fu3Kly+U2bNmHmzJmIjIxESkoKVq1aha1bt+KLL76o5+REREREDY+oxW7hwoUIDAyEv78/HB0dER8fDwMDA6xevbrK5U+ePIm3334bI0eOhK2tLT744AP4+fm9cpaPiIiIqCkQrdiVlpbi/Pnz8Pb2/r8wWlrw9vbGqVOnqlynR48eOH/+vLLIZWRkYO/evejXr98LX+fJkycoKChQuRERERFJUTOxXvjevXuoqKiApaWlyrilpSWuX79e5TojR47EvXv38M4770AQBJSXl2PixIkv3RQbGxuLOXPmqDU7ERERUUMk+sETNZGUlISYmBgsW7YMFy5cQEJCAvbs2YPo6OgXrjNr1izk5+crb9nZ2fWYmIiIiKj+iDZjZ2ZmBm1tbeTl5amM5+XlwcrKqsp1wsPD8fHHH2PChAkAACcnJxQVFeF//ud/8OWXX0JLq3JPlcvlkMvl6n8DRERERA2MaDN2urq6cHd3R2JionJMoVAgMTERnp6eVa5TXFxcqbxpa2sDAARB0FxYIiIiokZAtBk7AAgJCcHYsWPh4eGBbt26YdGiRSgqKoK/vz8AYMyYMWjdujViY2MBAAMGDMDChQvh6uqK7t27Iz09HeHh4RgwYICy4BERERE1VaIWuxEjRuDu3buIiIhAbm4uXFxcsH//fuUBFVlZWSozdGFhYZDJZAgLC8Pt27dhbm6OAQMGYO7cuWK9BSIiIqIGQyY0sW2YBQUFMDExQX5+PoyNjcWOQyQts02qsUy+5nMQEUlITbpLozoqloiIiIhejMWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkolbF7siRI2oLEBcXB1tbW+jp6aF79+44c+bMS5d/9OgRJk2ahFatWkEul+P111/H3r171ZaHiIiIqLGqVbHz9fVF+/bt8dVXXyE7O7vWL75161aEhIQgMjISFy5cgLOzM3x8fHDnzp0qly8tLcX777+PzMxMbN++HampqVixYgVat25d6wxEREREUlGrYnf79m0EBQVh+/btsLe3h4+PD7Zt24bS0tIaPc/ChQsRGBgIf39/ODo6Ij4+HgYGBli9enWVy69evRoPHjzAzp078fbbb8PW1hZeXl5wdnauzdsgIiIikpRaFTszMzMEBwfj0qVL+O233/D666/j008/hbW1NaZMmYLLly+/8jlKS0tx/vx5eHt7/18YLS14e3vj1KlTVa6za9cueHp6YtKkSbC0tETnzp0RExODioqK2rwNIiIiIkmp88ETbm5umDVrFoKCglBYWIjVq1fD3d0dPXv2xB9//PHC9e7du4eKigpYWlqqjFtaWiI3N7fKdTIyMrB9+3ZUVFRg7969CA8Px4IFC/DVV1+98HWePHmCgoIClRsRERGRFNW62JWVlWH79u3o168f2rVrhwMHDmDp0qXIy8tDeno62rVrh2HDhqkzKxQKBSwsLPD999/D3d0dI0aMwJdffon4+PgXrhMbGwsTExPlzcbGRq2ZiIiIiBqKZrVZafLkydi8eTMEQcDHH3+M+fPno3PnzsrHmzdvjm+++QbW1tYvfA4zMzNoa2sjLy9PZTwvLw9WVlZVrtOqVSvo6OhAW1tbOebg4IDc3FyUlpZCV1e30jqzZs1CSEiI8n5BQQHLHREREUlSrWbsrl27hv/3//4f/vrrLyxatEil1D1jZmb20tOi6Orqwt3dHYmJicoxhUKBxMREeHp6VrnO22+/jfT0dCgUCuVYWloaWrVqVWWpAwC5XA5jY2OVGxEREZEU1arYRUZGYtiwYZDL5Srj5eXlSE5OBgA0a9YMXl5eL32ekJAQrFixAmvXrkVKSgo++eQTFBUVwd/fHwAwZswYzJo1S7n8J598ggcPHmDq1KlIS0vDnj17EBMTg0mTJtXmbRARERFJSq02xfbp0wc5OTmwsLBQGc/Pz0efPn2qfZTqiBEjcPfuXURERCA3NxcuLi7Yv3+/8oCKrKwsaGn9X/e0sbHBgQMHEBwcjC5duqB169aYOnUqPv/889q8DSIiIiJJkQmCINR0JS0tLeTl5cHc3FxlPC0tDR4eHg36yNOCggKYmJggPz+fm2WJ1G22STWWydd8DiIiCalJd6nRjN2QIUMAADKZDOPGjVPZFFtRUYErV66gR48etYhMRERERHVVo2JnYvL0r3FBEGBkZAR9fX3lY7q6unjrrbcQGBio3oREREREVC01KnZr1qwBANja2iI0NBTNmzfXSCgiIiIiqrlaHTwRGRmp7hxEREREVEfVLnZubm5ITExEixYt4OrqCplM9sJlL1y4oJZwRNRw2M7c88plMvXqIQgREb1QtYvdoEGDlAdLDB48WFN5iIiIiKiWql3s/rn5lZtiiYiIiBqeWl15goiIiIganmrP2LVo0eKl+9X904MHD2odiIiIiIhqp9rFbtGiRRqMQURERER1Ve1iN3bsWE3mICIiIqI6qnaxKygoUF6f7FXXguU1WImIiIjqX432scvJyYGFhQVMTU2r3N9OEATIZDJUVFSoNSQRERERvVq1i92vv/6Kli1bAgCOHDmisUBEREREVDvVLnZeXl5Vfk1EREREDUOtrhULAA8fPsSqVauQkpICAHB0dIS/v79yVo+IiIiI6letTlCcnJwMW1tbLFmyBA8fPsTDhw+xZMkS2NnZITk5Wd0ZiYiIiKgaajVjN2nSJIwYMQLfffcdtLW1AQAVFRX49NNPMWnSJFy9elWtIYmIiIjo1Wo1Y5eeno7PPvtMWeoAQFtbGyEhIUhPT1dbOCIiIiKqvloVOzc3N+W+df+UkpICZ2fnOociIiIiopqr9qbYK1euKL+eMmUKpk6divT0dLz11lsAgNOnTyMuLg7z5s1Tf0oiIiIieiWZIAhCdRbU0tKCTCbDqxZv6CcoLigogImJCfLz83mFDKIasJ2555XLZOqNfPUTzc5XQxoioqajJt2l2jN2N2/erHMwIiIiItKcahe7du3aaTIHEREREdVRrU9QDADXrl1DVlYWSktLVcYHDhxYp1BEREREVHO1KnYZGRn48MMPcfXqVZX97mQyGQA06H3siIiIiKSqVqc7mTp1Kuzs7HDnzh0YGBjgjz/+QHJyMjw8PJCUlKTmiERERERUHbWasTt16hR+/fVXmJmZQUtLC1paWnjnnXcQGxuLKVOm4OLFi+rOSURERESvUKsZu4qKChgZGQEAzMzM8NdffwF4eoBFamqq+tIRERERUbXVasauc+fOuHz5Muzs7NC9e3fMnz8furq6+P7772Fvb6/ujERERERUDbUqdmFhYSgqKgIAREVF4V//+hd69uyJ1157DVu3blVrQCIiIiKqnloVOx8fH+XXHTp0wPXr1/HgwQO0aNFCeWQsEREREdWvOp3HDgCys7MBADY2NnUOQ0RERES1V6uDJ8rLyxEeHg4TExPY2trC1tYWJiYmCAsLQ1lZmbozEhEREVE11GrGbvLkyUhISMD8+fPh6ekJ4OkpUGbPno379+/ju+++U2tIIiIiInq1WhW7TZs2YcuWLejbt69yrEuXLrCxsYGfnx+LHREREZEIarUpVi6Xw9bWttK4nZ0ddHV165qJiIiIiGqhVsUuKCgI0dHRePLkiXLsyZMnmDt3LoKCgtQWjoiIiIiqr9qbYocMGaJy//Dhw2jTpg2cnZ0BAJcvX0ZpaSnee+899SYkIiIiomqpdrEzMTFRuf/vf/9b5T5Pd0JEREQkrmoXuzVr1mgyBxERERHVUZ1OUHz37l2kpqYCAN544w2Ym5urJRQRERER1VytDp4oKirC+PHj0apVK/Tq1Qu9evWCtbU1AgICUFxcrO6MRERERFQNtSp2ISEhOHr0KH755Rc8evQIjx49ws8//4yjR4/is88+U3dGIiIiIqqGWm2K/emnn7B9+3b07t1bOdavXz/o6+tj+PDhPEExERERkQhqNWNXXFwMS0vLSuMWFhbcFEtEREQkkloVO09PT0RGRqKkpEQ59vjxY8yZM0d57VgiIiIiql+12hS7aNEi+Pr6VjpBsZ6eHg4cOKDWgERERERUPbUqdk5OTvjzzz+xceNGXL9+HQDg5+eHUaNGQV9fX60BiYiIiKh6alzsysrK0KlTJ+zevRuBgYGayEREREREtVDjfex0dHRU9q0jIiIiooahVgdPTJo0Cf/5z39QXl6u7jxEREREVEu12sfu7NmzSExMxMGDB+Hk5ITmzZurPJ6QkKCWcERERERUfbUqdqampvj3v/+t7ixEREREVAc1KnYKhQJff/010tLSUFpainfffRezZ8/mkbBEREREDUCN9rGbO3cuvvjiCxgaGqJ169ZYsmQJJk2apKlsRERERFQDNSp269atw7Jly3DgwAHs3LkTv/zyCzZu3AiFQqGpfERERERUTTUqdllZWejXr5/yvre3N2QyGf766y+1ByMiIiKimqlRsSsvL4eenp7KmI6ODsrKytQaioiIiIhqrkYHTwiCgHHjxkEulyvHSkpKMHHiRJVTnvB0J0RERET1r0bFbuzYsZXGRo8erbYwRERERFR7NSp2a9as0VQOIiIiIqqjWl1SjIiIiIgaHhY7IiIiIoloEMUuLi4Otra20NPTQ/fu3XHmzJlqrbdlyxbIZDIMHjxYswGJiIiIGgHRi93WrVsREhKCyMhIXLhwAc7OzvDx8cGdO3deul5mZiZCQ0PRs2fPekpKRERE1LCJXuwWLlyIwMBA+Pv7w9HREfHx8TAwMMDq1atfuE5FRQVGjRqFOXPmwN7evh7TEhERETVcoha70tJSnD9/Ht7e3soxLS0teHt749SpUy9cLyoqChYWFggICHjlazx58gQFBQUqNyIiIiIpErXY3bt3DxUVFbC0tFQZt7S0RG5ubpXrHD9+HKtWrcKKFSuq9RqxsbEwMTFR3mxsbOqcm4iIiKghEn1TbE38/fff+Pjjj7FixQqYmZlVa51Zs2YhPz9fecvOztZwSiIiIiJx1OgExepmZmYGbW1t5OXlqYzn5eXBysqq0vI3btxAZmYmBgwYoBxTKBQAgGbNmiE1NRXt27dXWUcul6tcAo2IiIhIqkSdsdPV1YW7uzsSExOVYwqFAomJifD09Ky0fKdOnXD16lVcunRJeRs4cCD69OmDS5cucTMrERERNWmiztgBQEhICMaOHQsPDw9069YNixYtQlFREfz9/QEAY8aMQevWrREbGws9PT107txZZX1TU1MAqDRORERE1NSIXuxGjBiBu3fvIiIiArm5uXBxccH+/fuVB1RkZWVBS6tR7QpIREREJAqZIAiC2CHqU0FBAUxMTJCfnw9jY2Ox4xA1GrYz97xymUy9ka9+otn5akhDRNR01KS7cCqMiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJEv6QYERERaVa1rhwzr389JCFN44wdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJBIsdERERkUSw2BERERFJRDOxAxBR0+K01umVy1wde7UekhARSQ9n7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkguexIyIiomrheSgbPs7YEREREUkEix0RERGRRDSIYhcXFwdbW1vo6emhe/fuOHPmzAuXXbFiBXr27IkWLVqgRYsW8Pb2funyRERERE2F6MVu69atCAkJQWRkJC5cuABnZ2f4+Pjgzp07VS6flJQEPz8/HDlyBKdOnYKNjQ0++OAD3L59u56TExERETUsohe7hQsXIjAwEP7+/nB0dER8fDwMDAywevXqKpffuHEjPv30U7i4uKBTp05YuXIlFAoFEhMT6zk5ERERUcMiarErLS3F+fPn4e3trRzT0tKCt7c3Tp06Va3nKC4uRllZGVq2bFnl40+ePEFBQYHKjYiIiEiKRC129+7dQ0VFBSwtLVXGLS0tkZubW63n+Pzzz2Ftba1SDv8pNjYWJiYmypuNjU2dcxMRERE1RKJviq2LefPmYcuWLdixYwf09PSqXGbWrFnIz89X3rKzs+s5JREREVH9EPUExWZmZtDW1kZeXp7KeF5eHqysrF667jfffIN58+bh8OHD6NKlywuXk8vlkMvlaslLRERE1JCJOmOnq6sLd3d3lQMfnh0I4enp+cL15s+fj+joaOzfvx8eHh71EZWIiIiowRP9kmIhISEYO3YsPDw80K1bNyxatAhFRUXw9/cHAIwZMwatW7dGbGwsAOA///kPIiIisGnTJtja2ir3xTM0NIShoaFo74OIiIhIbKIXuxEjRuDu3buIiIhAbm4uXFxcsH//fuUBFVlZWdDS+r+Jxe+++w6lpaUYOnSoyvNERkZi9uzZ9RmdiIiIqEERvdgBQFBQEIKCgqp8LCkpSeV+Zmam5gMRERERNUKN+qhYIiIiIvo/LHZEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRLHZEREREEsFiR0RERCQRLHZEREREEtEgLilGL+e01umVy1wde7UekhAREVFDxhk7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIoloJnYAIiIioobKaa3TK5e5OvZqPSSpHhY7IiINamz/KBBR48ZNsUREREQSwWJHREREJBEsdkREREQSwWJHREREJBE8eIIaHO5sTkREVDucsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIongwRMaZDtzzyuXyZzXvx6SEBERUVPAGTsiIiIiiWCxIyIiIpIIFjsiIiIiiWCxIyIiIpIIFjsiIiIiieBRsUREpDa8JCCRuFjsiETGfwiJGi9+fqmh4aZYIiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSCBY7IiIiIolgsSMiIiKSiAZR7OLi4mBraws9PT10794dZ86ceenyP/74Izp16gQ9PT04OTlh79699ZSUiIiIqOES/VqxW7duRUhICOLj49G9e3csWrQIPj4+SE1NhYWFRaXlT548CT8/P8TGxuJf//oXNm3ahMGDB+PChQvo3LmzCO+AiIhIAmabvHoZu7aaz0F1IvqM3cKFCxEYGAh/f384OjoiPj4eBgYGWL16dZXLL168GL6+vpg+fTocHBwQHR0NNzc3LF26tJ6TExERETUsos7YlZaW4vz585g1a5ZyTEtLC97e3jh16lSV65w6dQohISEqYz4+Pti5c6cmoxJRE2M7c88rl8mc178ekhBRbTTVz7Coxe7evXuoqKiApaWlyrilpSWuX79e5Tq5ublVLp+bm1vl8k+ePMGTJ0+U9/Pz8wEABQUFdYleLYonxa9cpjo5Kh5XqOV5NK1z5IFXLvP7HJ9XLtNY3q+6NJb3W63fZ5nwymUk9X4l9PlVF77fyhrC+21qn19AWp/hZ88vCK/+GUEQ0e3btwUAwsmTJ1XGp0+fLnTr1q3KdXR0dIRNmzapjMXFxQkWFhZVLh8ZGSkA4I033njjjTfeeGvUt+zs7Fd2K1Fn7MzMzKCtrY28vDyV8by8PFhZWVW5jpWVVY2WnzVrlsqmW4VCgQcPHuC1116DTCar4zuovoKCAtjY2CA7OxvGxsb19rpi4fuVvqb2nvl+pY3vV9oa+/sVBAF///03rK2tX7msqMVOV1cX7u7uSExMxODBgwE8LV6JiYkICgqqch1PT08kJiZi2rRpyrFDhw7B09OzyuXlcjnkcrnKmKmpqTri14qxsXGj/KWqLb5f6Wtq75nvV9r4fqWtMb9fExOTai0n+ulOQkJCMHbsWHh4eKBbt25YtGgRioqK4O/vDwAYM2YMWrdujdjYWADA1KlT4eXlhQULFqB///7YsmULzp07h++//17Mt0FEREQkOtGL3YgRI3D37l1EREQgNzcXLi4u2L9/v/IAiaysLGhp/d9ZWXr06IFNmzYhLCwMX3zxBTp27IidO3fyHHZERETU5Ile7AAgKCjohZtek5KSKo0NGzYMw4YN03Aq9ZLL5YiMjKy0WViq+H6lr6m9Z75faeP7lbam9H5lglCdY2eJiIiIqKET/coTRERERKQeLHZEREREEsFiR0RERCQRLHZEREREEsFipyHl5eVYt25dpatkEBEREWkKj4rVIAMDA6SkpKBdu3ZiR6kXY8eORUBAAHr16iV2lHphb2+Ps2fP4rXXXlMZf/ToEdzc3JCRkSFSMvXZtWtXtZcdOHCgBpOQGCoqKnD16lW0a9cOLVq0EDsO1VBNLkzfWK/G8CLJyckvfVzK/041iPPYSVW3bt1w6dKlJlPs8vPz4e3tjXbt2sHf3x9jx45F69atxY6lMZmZmaioqKg0/uTJE9y+fVuEROr37FJ/z8hkMvzzb8F/Xm+5qu9FY7d27VqYmZmhf//+AIAZM2bg+++/h6OjIzZv3iy5z/a0adPg5OSEgIAAVFRUwMvLCydPnoSBgQF2796N3r17ix1R7bZv345t27YhKysLpaWlKo9duHBBpFTqYWpqWu1rokvt81vV76rU/3/1DDfFatCnn36KkJAQLF26FKdOncKVK1dUblKzc+dO3L59G5988gm2bt0KW1tb9O3bF9u3b0dZWZnY8dRm165dypmsAwcOKO/v2rULO3bsQHR0NGxtbcUNqSYKhUJ5O3jwIFxcXLBv3z48evQIjx49wt69e+Hm5ob9+/eLHVUjYmJioK+vDwA4deoU4uLiMH/+fJiZmSE4OFjkdOq3fft2ODs7AwB++eUX3Lx5E9evX0dwcDC+/PJLkdOp35IlS+Dv7w9LS0tcvHgR3bp1w2uvvYaMjAz07dtX7Hh1duTIEfz666/49ddfsXr1alhYWGDGjBnYsWMHduzYgRkzZsDS0hKrV68WO6raPXz4UOV2584d7N+/H127dsXBgwfFjqdZAmmMTCardNPS0lL+V+rOnz8vBAUFCXp6eoKZmZkwbdo0IS0tTexYdVbVz/XZTVdXV3j99deFX375ReyYavfmm28Kx44dqzSenJwsdOrUSYREmqevry/cunVLEARBmDFjhvDxxx8LgiAIv//+u2BmZiZmNI2Qy+VCdna2IAiCEBgYKEydOlUQBEHIyMgQjIyMREymGW+88YawadMmQRAEwdDQULhx44YgCIIQHh4uTJo0Scxoavfuu+8q3+s/bdy4UfDy8qr/QCJJSkoS3NzcxI6hUZyx06CbN29WumVkZCj/K2U5OTk4dOgQDh06BG1tbfTr1w9Xr16Fo6Mjvv32W7Hj1cmzGax27drh7t27KrNaT548QWpqKv71r3+JHVPtbty4AVNT00rjJiYmyMzMrPc89cHQ0BD3798HABw8eBDvv/8+AEBPTw+PHz8WM5pGWFpa4tq1a6ioqMD+/fuV77e4uBja2toip1O/rKws9OjRAwCgr6+Pv//+GwDw8ccfY/PmzWJGU7tTp07Bw8Oj0riHhwfOnDkjQiJxWFpaIjU1VewYGsV97DRIavvfvEpZWRl27dqFNWvW4ODBg+jSpQumTZuGkSNHKnfM3bFjB8aPH9/oN2OVlZXB3t4eDx48qHTwhFR17doVISEhWL9+PSwtLQEAeXl5mD59Orp16yZyOs14//33MWHCBLi6uiItLQ39+vUDAPzxxx+S2dz+T/7+/hg+fDhatWoFmUwGb29vAMBvv/2GTp06iZxO/aysrPDgwQO0a9cObdu2xenTp+Hs7IybN2+q7EsqBTY2NlixYgXmz5+vMr5y5UrY2NiIlEpznt/dSRAE5OTkYN68eXBxcREnVD1hsdOw9evXIz4+Hjdv3sSpU6fQrl07LFq0CHZ2dhg0aJDY8dSqVatWUCgU8PPzw5kzZ6r88PTp06fKWZ/GRkdHR5L7Sb7MqlWrMGTIELRt21b5D0F2djY6duyInTt3ihtOQ+Li4hAWFobs7Gz89NNPyhJ//vx5+Pn5iZxO/WbPno3OnTsjOzsbw4YNU14wXVtbGzNnzhQ5nfq9++672LVrF1xdXeHv74/g4GBs374d586dw5AhQ8SOp1bffvst/v3vf2Pfvn3o3r07AODMmTP4888/8dNPP4mcTv1cXFwqHewFAG+99ZYk9yn8J57uRIO+++47REREYNq0aZg7dy5+//132Nvb44cffsDatWtx5MgRsSOq1fr16zFs2DDo6emJHaVeBAcHQy6XY968eWJHqTeCIODQoUO4fv06AMDBwQHe3t7VPvKOGo+SkhLJf5af7ULRrNnTOY4tW7bg5MmT6NixI/73f/8Xurq6IidUr//+97/47rvvkJKSAuDp53fixImSnLG7deuWyn0tLS2Ym5tL/ncaYLHTKEdHR8TExGDw4MEwMjLC5cuXYW9vj99//x29e/fGvXv3xI6oNmVlZdDX18elS5fQuXNnsePUi8mTJ2PdunXo2LEj3N3d0bx5c5XHFy5cKFIy9WuKP99njh07huXLlyMjIwM//vgjWrdujfXr18POzg7vvPOO2PHUqqKiAjExMYiPj0deXh7S0tJgb2+P8PBw2NraIiAgQOyIVAtlZWXw9fVFfHw8OnbsKHYc0jAePKFBN2/ehKura6VxuVyOoqIiERJpjo6ODtq2bSvpcwM97/fff4ebmxuMjIyQlpaGixcvKm+XLl0SO55aNcWfLwD89NNP8PHxgb6+Pi5cuIAnT54AeHrOxpiYGJHTqd/cuXPxww8/YP78+SqzVZ07d8bKlStFTKYZ9vb28Pf3V/5cn7l37x7s7e1FSqV+TXHXEQA4evQoBgwYgA4dOqBDhw4YOHAgjh07JnYszRPvgFzpc3BwEHbu3CkIguqh9EuWLBFcXV3FjKYRK1euFPr16yfcv39f7CikAU3x5+vi4iKsXbtWEATVz/CFCxcES0tLMaNpRPv27YXDhw8LgqD6flNSUgRTU1Mxo2mETCYTOnbsKHTt2lXIyclRjufm5krulFTTpk0TPv/8c7Fj1Jv169cLzZo1E4YPHy4sXrxYWLx4sTB8+HBBR0dH2Lhxo9jxNIoHT2hQSEgIJk2ahJKSEgiCgDNnzmDz5s2IjY2V5F+/S5cuRXp6OqytrdGuXbtKmyYb+1ncX+a///0vAKBNmzYiJ9GcpvjzTU1NrfLSQyYmJnj06FH9B9Kw27dvo0OHDpXGFQqFpE4y/oxMJsP+/fsRGhoKd3d37Ny5E127dhU7lkaUl5dj9erVOHz4sOR3HQGezj7Pnz9f5QwMU6ZMwcKFCxEdHY2RI0eKmE6zWOw0aMKECdDX10dYWBiKi4sxcuRIWFtbY/Hixfjoo4/Ejqd2z19+SuoUCgW++uorLFiwAIWFhQAAIyMjfPbZZ/jyyy+hpSWtPR2a2s8XeHo6jPT09EqnNjl+/LikNtU94+joiGPHjlU6VdP27dur3K2ksRMEAYaGhkhISMCsWbPg5eWF77//Xnn+Pil5tusIAKSlpak8JsWDnzIyMjBgwIBK4wMHDsQXX3whQqJ6JPaUYVNRVFQk5OXliR2D1GjmzJmCubm5sGzZMuHy5cvC5cuXhbi4OMHc3Fz44osvxI5HahATEyM4OjoKp0+fFoyMjIRjx44JGzZsEMzNzYUlS5aIHU/tdu7cKZiYmAjz5s0TDAwMhK+//lqYMGGCoKurKxw8eFDseGqnpaWl8v/l9evXC3p6eoK/v7/kNsU2Ne3btxfi4+MrjX/33XdChw4dREhUf1jsNKi4uFgoKipS3s/MzBS+/fZb4cCBAyKm0qyHDx8KK1asEGbOnKncF+v8+fPCf//7X5GTqV+rVq2En3/+udL4zp07BWtraxESkbopFArhq6++Epo3b668bJyenp4QFhYmdjSNSU5OFry9vQVzc3NBX19fePvttyX7/yyZTFbpD+6TJ08KlpaWLHaN3LJlywRdXV1h4sSJwrp164R169YJ//u//yvI5fIqC5+U8HQnGvTBBx9gyJAhmDhxIh49eoQ33ngDurq6uHfvHhYuXIhPPvlE7IhqdeXKFXh7eysvMZWamgp7e3uEhYUhKysL69atEzuiWunp6eHKlSt4/fXXVcZTU1Ph4uIiuUtOVVRU4Ntvv8W2bduQlZWF0tJSlccfPHggUjLNKy0tRXp6OgoLC+Ho6AhDQ0OxI5EG5eXl4fr16/Dy8hI7ilqdO3fuhZ/fhIQEkVJpzo4dO7BgwQKV8/ZNnz5dchcHeJ60dgJqYC5cuICePXsCeLqPipWVFW7duoV169ZhyZIlIqdTv5CQEIwbNw5//vmnykkg+/Xrh+TkZBGTaYazszOWLl1aaXzp0qVwdnYWIZFmzZkzBwsXLsSIESOQn5+PkJAQDBkyBFpaWpg9e7bY8TRKV1cXjo6O6Natm6RL3YQJE5CUlCR2jHoTFRWFX3/9tdK4oaEhjh49KkIizdmyZQt69OiBlJQU7NixA2VlZfjjjz/w66+/wsTEROx4ajd27Fi89tprOH78OO7fv4/79+/j+PHjki91ALiPnSbp6+sLt27dEgRBEIYNGybMnj1bEARByMrKEvT19cWMphHGxsZCenq6IAiqp0rIzMwU5HK5mNE0IikpSWjevLng4OAgjB8/Xhg/frzg4OAgGBoaCsnJyWLHUzt7e3th9+7dgiA8/fk++1kvXrxY8PPzEzOaxhQWFgphYWGCp6en0L59e8HOzk7lJjUDBw4U5HK50KZNGyE0NFS4ePGi2JE0SiaTCbq6usKCBQtUxqV4uhMnJydh6dKlgiD83/+fFQqFEBgYKERERIicTv0GDRok6OjoCB06dBDmzp0r3L59W+xI9YYzdhrUoUMH7Ny5E9nZ2Thw4AA++OADAMCdO3dgbGwscjr1k8vlKCgoqDSelpYGc3NzERJplpeXF9LS0vDhhx/i0aNHePToEYYMGYLU1FTlTK2U5ObmwsnJCcDTGY38/HwAwL/+9S/s2bNHzGgaM2HCBKxatQo9e/ZEUFAQpk6dqnKTmp9//hk5OTkIDw/H2bNn4e7ujjfffBMxMTHIzMwUO55GrFu3DjExMfD396+0eVJKbty4gf79+wN4OgNdVFQEmUyG4OBgfP/99yKnU7+dO3fi9u3b+OSTT7B161a0a9cOffv2xY8//ijJU/eoELtZStmPP/4o6OjoCFpaWoK3t7dyPCYmRvD19RUxmWYEBAQIgwcPFkpLSwVDQ0MhIyNDuHXrluDq6ipMnTpV7Hhq8eGHHwr5+fmCIAjC2rVrhZKSEpET1Z/XX39dOH36tCAIgvD2228LsbGxgiAIwpYtWwRzc3Mxo2mMiYmJcPz4cbFjiCY7O1uYP3++0KlTJ0FbW1vsOGr37OCJ9PR0wcHBQfD09BTy8vIkOWPXunVr4cqVK4IgPJ2927RpkyAITw8WMTY2FjNavTh//rwQFBQk6OnpCWZmZsK0adOEtLQ0sWNpBGfsNGjo0KHIysrCuXPncODAAeX4e++9h2+//VbEZJrx7HxuFhYWePz4Mby8vNChQwcYGRlh7ty5YsdTi927dysvB+fv76+ctWoKPvzwQyQmJgJ4ep3c8PBwdOzYEWPGjMH48eNFTqcZLVq0QMuWLcWOIYqysjKcO3cOv/32GzIzM2FpaSl2JLV7dv629u3b4/Tp0zA2Noa7uzvOnTsncjL169WrFw4dOgQAGDZsGKZOnYrAwED4+fnhvffeEzmdZuXk5ODQoUM4dOgQtLW10a9fP1y9ehWOjo6S/LeYR8XWk6ZwZYJnjh8/jitXrqCwsBBubm7w9vYWO5LadOnSBW5ubujTpw/8/f2xZMmSF25WHzNmTD2nq1+nT5/GyZMn0bFjxypPBCoFGzZswM8//4y1a9fCwMBA7Dj14siRI9i0aRN++uknKBQKDBkyBKNGjcK7774ruRPZamlpITc3FxYWFgCennR82rRp+O6776BQKCR1beQHDx6gpKQE1tbWUCgUmD9/vvLzGxYWhhYtWogdUa3Kysqwa9curFmzBgcPHkSXLl0wYcIEjBw5Uvn/7B07dmD8+PF4+PChyGnVi8VOg5ralQmys7NhY2MjdgyNOnHiBD777DPcuHEDDx48gJGRUZX/2MlkMkmf/kPKXF1dVX6m6enpEAQBtra20NHRUVlWapdRa926NR48eABfX1+MGjUKAwYMgFwuFzuWxqxduxYfffRRpfe4Zs0aJCcnY82aNSIlo7oyMzODQqGAn58fAgMD4eLiUmmZR48ewdXVFTdv3qz/gBrEYqdBs2bNwqpVqzBnzhy8/fbbAJ7OZs2ePRuBgYGS2Tz5jLa2Nt555x2MHj0aQ4cOldxfgM97/q99qWvbti169+4NLy8v9O7dG+3btxc7kkbMmTOn2stGRkZqMEn9W7FiBYYNGwZTU1Oxo5CajRkzBn369EGvXr0k+9n9p/Xr12PYsGEqp95qKljsNMja2hrx8fEYOHCgyvjPP/+MTz/9FLdv3xYpmWZcvHgRmzZtwpYtW3D37l34+vpi9OjRkvqrf8iQIfjhhx9gbGyMtWvXYvjw4dDX1xc7Vr3YsGEDkpOTkZSUhPT0dLRu3RpeXl7KotexY0exI5IaSXX3kSVLluB//ud/oKen99LzicpkMkyePLkek2nWhAkTkJycrPLZffaHGj+70sJip0FN7coEzwiCgKSkpEr76axevVrsaHWmq6uLW7duoVWrVtDW1kZOTk6TmbH7p5ycHBw9ehS7d+/G1q1bJbc/0jNnz56FQqFA9+7dVcZ/++03aGtrw8PDQ6RkmtEUdh+xs7PDuXPn8Nprr8HOzu6Fy8lkMmRkZNRjsvpx+/ZtJCcn4+jRozh69CjS0tLQqlUrZZGnxq+Z2AGk7NmVCZ7/q1CqVyZ4RiaToU+fPujTpw8++eQTBAQEYO3atZIodp06dcKsWbPQp08fCIKAbdu2NamDJ4qLi3H8+HEkJSXhyJEjuHjxIjp37ozevXuLHU0jJk2ahBkzZlQqdrdv38Z//vMf/PbbbyIl04wvv/wSq1atwrx58yrtPlJSUiKJ3Uf+uT/VP79+NschtQNEnteiRQu89tpraNGiBUxNTdGsWTNJnme0KeOMnQYdPXoU/fv3R9u2beHp6QkAOHXqFLKzs7F3715JnsQWeLoJZ9OmTdi0aRN+//13eHp6YtSoUZg4caLY0ers5MmTCAkJaZIHT/To0QMXL16Eg4ODchNOr169JL0vpaGhIa5cuQJ7e3uV8Zs3b6JLly74+++/RUqmGU1t9xEAWLVqFb799lv8+eefAICOHTti2rRpmDBhgsjJ1OuLL75AUlKS8jP8bFOs1D/DTRFn7DTo2ZUJ4uLicP36dQBP99H69NNPYW1tLXI69Vu+fDk2bdqE48ePw8HBAaNGjcLPP/+Mdu3aiR1NbXr06IHTp08DeHrwRFpaWpPZFHv9+nU0b94cnTp1QqdOneDg4CD5fxDkcjny8vIqFbucnBw0aya9/30+ePAAnTp1qjTeqVMnyf2hAgARERFYuHAhJk+erPLHd3BwMLKyshAVFSVyQvWZN28ezM3NERkZiSFDhlTaRYikgzN2pDY2Njbw8/PDqFGjJL2p+Zlbt24hKysLy5cvR0ZGBn788Ue0bt0a69evh52dHd555x2xI6qVIAi4evUqkpKScPToUSQnJ0NXVxdeXl7o06cPAgMDxY6odn5+fsjJycHPP/+svFD6o0ePMHjwYFhYWGDbtm0iJ1Sv7t27o3v37pV2H5k8eTLOnj2r/KNGKszNzbFkyRL4+fmpjG/evBmTJ0/GvXv3REqmfpcvX8bRo0eRlJSEY8eOKT+7vXv3Ru/evVn0JITFTs2uXLlS7WW7dOmiwST1TxAEHD9+vMkUnZ9++gkff/wxRo0ahfXr1+PatWuwt7fH0qVLsXfvXuzdu1fsiBojCALOnz+PpUuXYuPGjZI9eOL27dvo1asX7t+/D1dXVwDApUuXYGlpiUOHDknuvI0v2n0kKysL+/btk9zuI6ampjh79mylo0LT0tLQrVs3PHr0SJxg9eDy5cv49ttvJf35bapY7NRMS0sLMpkMr/q2ymQyyX2QmlrRcXV1RXBwMMaMGQMjIyNcvnwZ9vb2uHjxIvr27Yvc3FyxI6rVhQsXkJSUhKSkJBw/fhx///03nJyclPvbDRo0SOyIGlFUVISNGzfi8uXL0NfXR5cuXeDn51fpZMVScfv2bXz33XdISUkBADg4OEh295HJkydDR0cHCxcuVBkPDQ3F48ePERcXJ1Iy9RMEARcvXlT5DBcUFKBLly7w8vKS5KW1mioWOzW7detWtZeV0r5nQNMrOgYGBrh27RpsbW1V3m9GRgYcHR1RUlIidkS1atasGVxdXZXnruvVq5dy8yRJR0lJCa5cuYI7d+5AoVCoPPb8QRWN3eTJk7Fu3TrY2NjgrbfeAvD0VDZZWVkYM2aMSnl/vvw1Ni1atEBhYSGcnZ2Vm2B79uzJk1FLkPT2/hXZP8tabGwsLC0tK10gffXq1bh79y4+//zz+o6nUampqejVq1elcRMTE0lu0rCyskJ6ejpsbW1Vxo8fP15pZ/vGrqKiAgkJCejZs6fkD5h43p9//okjR45UWXQiIiJESqUZ+/fvx5gxY3D//v1KWx2kuJXh999/h5ubGwDgxo0bAJ5eisrMzAy///67cjkpnAJlw4YN6Nmz5wtPz0TSwWKnQc+OEn3em2++iY8++khyxa4pFR0ACAwMxNSpU7F69WrIZDL89ddfOHXqFEJDQxEeHi52PLXS1tbG8OHDkZKS0qSK3YoVK/DJJ5/AzMwMVlZWKv/Ay2QyyRW7yZMnY9iwYYiIiIClpaXYcTTuyJEjYkeoN/3791d+LdWritD/TyCNkcvlQkZGRqXxGzduCHK5XIREmhUTEyM4OjoKp0+fFoyMjIRjx44JGzZsEMzNzYUlS5aIHU/tFAqF8NVXXwnNmzcXZDKZIJPJBD09PSEsLEzsaBrh7u4uHD58WOwY9apt27bCvHnzxI5Rb4yMjIT09HSxY5AGVFRUCHPmzBGMjY0FLS0tQUtLSzAxMRGioqKEiooKseORGnHGToNsbGxw4sSJSpetOXHihCR3RJ45cyYUCgXee+89FBcXo1evXpDL5QgNDZXUNRefkclk+PLLLzF9+nSkp6ejsLAQjo6OMDQ0FDuaRnz11VcIDQ1FdHQ03N3d0bx5c5XHpbiJ5+HDhxg2bJjYMerN0KFDkZSU1CQuEt/UNIWritBTPHhCg+bPn4/58+fj66+/xrvvvgsASExMxIwZM/DZZ59h1qxZIifUjNLS0iZRdJqaf14n9J+bJAVBkOT+VwAQEBCArl27SuKqKdVRXFyMYcOGwdzcHE5OTpWO/J0yZYpIyaiumuJVRZoqzthp0PTp03H//n18+umnKC0tBQDo6enh888/l2ypAwBdXV04OjqKHYPUrCntj/RMhw4dEB4ejtOnTzeJorN582YcPHgQenp6SEpKqrRPodTeb1PS1K4q0pRxxq4eFBYWIiUlBfr6+ujYsSPkcrnYkYioGp7fjeKfZDIZMjIy6jGN5llZWWHKlCmYOXOmygwtNX5N7aoiTRmLHRFV26NHj7Bq1SrlyWvffPNNjB8/nuezk4iWLVvi7Nmz3MdOgl50VZHs7Gzs3btXclcVacpY7IioWs6dOwcfHx/o6+ujW7duAICzZ8/i8ePHOHjwoPJ8YI1dSEgIoqOj0bx5c4SEhLxwOZlMhgULFtRjMs0LDg6Gubk5vvjiC7GjkJplZWWhWbNmiIuLw/Xr1wH831VFysvL0bZtW5ETkrqw2BFRtfTs2RMdOnTAihUr0KzZ091zy8vLMWHCBGRkZCA5OVnkhOrRp08f7NixA6ampujTp88Ll5PJZPj111/rMZnmTZkyBevWrYOzszO6dOlSaZ/Cxn71haZMW1sbOTk5sLCwUBm/f/8+LCwsJHnwU1PFYkdE1aKvr4+LFy9W2gH72rVr8PDwQHFxsUjJSF2aWpFtSrS0tJCbm1up2N26dQuOjo4oKioSKRmpG4+KJaJqMTY2RlZWVqVil52dDSMjI5FSkTo1xSOfpe7Z7gTPrpRiYGCgfKyiogK//fYbXFxcREpHmsBiR0TVMmLECAQEBOCbb75Bjx49ADw92fb06dPh5+cncjoiqsrFixcBPD3f5NWrV6Grq6t8TFdXF87OzggNDRUrHmkAN8US0QtduXIFnTt3hpaWFkpLSzF9+nTEx8ejvLwcAKCjo4NPPvkE8+bN42l8iBowf39/LF68WJJXiCFVLHZE9EL/3OHa3t4eZ8+ehb6+Pm7cuAEAaN++vcqmHSIiEhc3xRLRC5mamuLmzZuwsLBAZmYmFAoFDAwM4OTkJHY0IiKqAosdEb3Qv//9b3h5eaFVq1aQyWTw8PCAtrZ2lctK7SoMRESNEYsdEb3Q999/jyFDhiA9PR1TpkxBYGAgj4AlImrAuI8dEVWLv78/lixZwmJHRNSAsdgRERERSYSW2AGIiIiISD1Y7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgkgsWOiIiISCJY7IiIiIgk4v8Dlwpa4ndpYDkAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Plotting\n",
+    "x = torch.arange(len(vocab))\n",
+    "bar_width = 0.15\n",
+    "\n",
+    "fig, ax = plt.subplots()\n",
+    "for i, T in enumerate(temperatures):\n",
+    "    rects = ax.bar(x + i * bar_width, scaled_probas[i], bar_width, label=f'Temperature = {T}')\n",
+    "\n",
+    "ax.set_ylabel('Probability')\n",
+    "ax.set_xticks(x)\n",
+    "ax.set_xticklabels(vocab.keys(), rotation=90)\n",
+    "ax.legend()\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "# plt.savefig(\"temperature-plot.pdf\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d750e989-842a-4cfa-a44b-cf44d6e49163",
+   "metadata": {},
+   "source": [
+    "- 我们可以看到，通过温度0.1进行重新缩放会得到一个更尖锐的分布，接近于`torch.argmax`，以至于最可能的单词几乎总是被选中："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "e4600713-c51e-4f53-bf58-040a6eb362b8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0 x closer\n",
+      "0 x every\n",
+      "0 x effort\n",
+      "985 x forward\n",
+      "0 x inches\n",
+      "0 x moves\n",
+      "0 x pizza\n",
+      "15 x toward\n"
+     ]
+    }
+   ],
+   "source": [
+    "print_sampled_tokens(scaled_probas[1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "526e93cb-8e2a-42a1-b1ba-4fd5fe64c26b",
+   "metadata": {},
+   "source": [
+    "- 通过温度5重新缩放的概率更均匀分布："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "9dfb48f0-bc3f-46a5-9844-33b6c9b0f4df",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "165 x closer\n",
+      "75 x every\n",
+      "42 x effort\n",
+      "239 x forward\n",
+      "71 x inches\n",
+      "46 x moves\n",
+      "32 x pizza\n",
+      "227 x toward\n",
+      "103 x you\n"
+     ]
+    }
+   ],
+   "source": [
+    "print_sampled_tokens(scaled_probas[2])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c83f0c4-3774-4375-ad7f-96440ba5fef7",
+   "metadata": {},
+   "source": [
+    "- 假设大型语言模型（LLM）的输入是“every effort moves you”，使用上述方法有时会产生无意义的文本，例如“every effort moves you pizza”，这种情况发生的频率是3.2%（在1000次中有32次）。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6e4873e-07e4-4abb-85df-bdaedcc1a6f7",
+   "metadata": {},
+   "source": [
+    "### 5.3.2 Top-k采样"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d4da95a-8bb2-4f69-a9b0-a643531db5df",
+   "metadata": {},
+   "source": [
+    "- 为了能够使用更高的温度来增加输出的多样性，并降低无意义句子出现的概率，我们可以将采样的标记限制在最可能的前k个标记中："
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae6fffd-2730-4abe-a2d3-781fc4836f17",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/topk.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ba12da5-6ff1-4008-91b8-d2d537cbc14c",
+   "metadata": {},
+   "source": [
+    "- 在代码中，我们可以如下实现这一点："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "2a7f908a-e9ec-446a-b407-fb6dbf05c806",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Top logits: tensor([6.7500, 6.2800, 4.5100])\n",
+      "Top positions: tensor([3, 7, 0])\n"
+     ]
+    }
+   ],
+   "source": [
+    "top_k = 3\n",
+    "top_logits, top_pos = torch.topk(next_token_logits, top_k)\n",
+    "\n",
+    "print(\"Top logits:\", top_logits)\n",
+    "print(\"Top positions:\", top_pos)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "753865ed-79c5-48b1-b9f2-ccb132ff1d2f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([4.5100,   -inf,   -inf, 6.7500,   -inf,   -inf,   -inf, 6.2800,   -inf])\n"
+     ]
+    }
+   ],
+   "source": [
+    "new_logits = torch.where(\n",
+    "    condition=next_token_logits < top_logits[-1],\n",
+    "    input=torch.tensor(float('-inf')), \n",
+    "    other=next_token_logits\n",
+    ")\n",
+    "\n",
+    "print(new_logits)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "4844f000-c329-4e7e-aa89-16a2c4ebee43",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([0.0615, 0.0000, 0.0000, 0.5775, 0.0000, 0.0000, 0.0000, 0.3610, 0.0000])\n"
+     ]
+    }
+   ],
+   "source": [
+    "topk_probas = torch.softmax(new_logits, dim=0)\n",
+    "print(topk_probas)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56056503-a15d-4315-a3ff-46647a4c7c45",
+   "metadata": {},
+   "source": [
+    "### 5.3.3 修改文本生成函数"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34770423-473d-46f6-a5fa-6b2979564d26",
+   "metadata": {},
+   "source": [
+    "- 前两个小节介绍了温度采样和top-k采样。\n",
+    "- 让我们使用这两个概念来修改我们之前用于通过大型语言模型（LLM）生成文本的`generate_simple`函数，创建一个新的`generate`函数："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "id": "8e318891-bcc0-4d71-b147-33ce55febfa3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):\n",
+    "\n",
+    "    # For-loop is the same as before: Get logits, and only focus on last time step\n",
+    "    for _ in range(max_new_tokens):\n",
+    "        idx_cond = idx[:, -context_size:]\n",
+    "        with torch.no_grad():\n",
+    "            logits = model(idx_cond)\n",
+    "        logits = logits[:, -1, :]\n",
+    "\n",
+    "        # New: Filter logits with top_k sampling\n",
+    "        if top_k is not None:\n",
+    "            # Keep only top_k values\n",
+    "            top_logits, _ = torch.topk(logits, top_k)\n",
+    "            min_val = top_logits[:, -1]\n",
+    "            logits = torch.where(logits < min_val, torch.tensor(float('-inf')).to(logits.device), logits)\n",
+    "\n",
+    "        # New: Apply temperature scaling\n",
+    "        if temperature > 0.0:\n",
+    "            logits = logits / temperature\n",
+    "\n",
+    "            # Apply softmax to get probabilities\n",
+    "            probs = torch.softmax(logits, dim=-1)  # (batch_size, context_len)\n",
+    "\n",
+    "            # Sample from the distribution\n",
+    "            idx_next = torch.multinomial(probs, num_samples=1)  # (batch_size, 1)\n",
+    "\n",
+    "        # Otherwise same as before: get idx of the vocab entry with the highest logits value\n",
+    "        else:\n",
+    "            idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch_size, 1)\n",
+    "\n",
+    "        # Same as before: append sampled index to the running sequence\n",
+    "        idx = torch.cat((idx, idx_next), dim=1)  # (batch_size, num_tokens+1)\n",
+    "\n",
+    "    return idx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "id": "aa2a0d7d-0457-42d1-ab9d-bd67683e7ed8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Output text:\n",
+      " Every effort moves you know terrace _not brush.\"\n",
+      "\n",
+      "\"Never a little wild in a and Mrs. G\n"
+     ]
+    }
+   ],
+   "source": [
+    "torch.manual_seed(123)\n",
+    "\n",
+    "token_ids = generate(\n",
+    "    model=model,\n",
+    "    idx=text_to_token_ids(\"Every effort moves you\", tokenizer),\n",
+    "    max_new_tokens=20,\n",
+    "    context_size=GPT_CONFIG_124M[\"ctx_len\"],\n",
+    "    top_k=10,\n",
+    "    temperature=1.5\n",
+    ")\n",
+    "\n",
+    "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8817c673-6d27-417c-b2c1-3cff394a340d",
+   "metadata": {},
+   "source": [
+    "- **练习：** `generate`的设置是什么，以强制确定性行为？"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2002ca-f4c1-48af-9e0a-88bfc163ba0b",
+   "metadata": {},
+   "source": [
+    "## 5.4 在PyTorch中加载和保存模型权重"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc52676-f026-4566-a226-2a90269f9d53",
+   "metadata": {},
+   "source": [
+    "- 训练大型语言模型（LLM）在计算上是昂贵的，因此能够保存和加载LLM权重至关重要。\n",
+    "\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/mental-model-3.webp\" width=400px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10e4c7f9-592f-43d6-a00e-598fa01dfb82",
+   "metadata": {},
+   "source": [
+    "- PyTorch推荐的方式是保存模型权重，即所谓的`state_dict`，通过应用`torch.save`函数到`.state_dict()`方法来实现："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "id": "3d67d869-ac04-4382-bcfb-c96d1ca80d47",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "torch.save(model.state_dict(), \"model.pth\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90e889e0-07bf-43e5-8f92-5c5c7aeaad9e",
+   "metadata": {},
+   "source": [
+    "- 然后我们可以按照以下方式将模型权重加载到一个新的`GPTModel`模型实例中："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "id": "9d57d914-60a3-47f1-b499-5352f4c457cb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = GPTModel(GPT_CONFIG_124M)\n",
+    "model.load_state_dict(torch.load(\"model.pth\"))\n",
+    "model.eval();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caa81aec-9c72-4f46-8ae2-4a4fde3edbc1",
+   "metadata": {},
+   "source": [
+    "- 使用自适应优化器（如Adam或AdamW）而不是常规的SGD来训练大型语言模型（LLM）是常见的做法。\n",
+    "- 这些自适应优化器会为每个模型权重存储额外的参数，因此如果我们计划稍后继续预训练，保存它们也是有意义的。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "id": "bbd175bb-edf4-450e-a6de-d3e8913c6532",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "torch.save({\n",
+    "    \"model_state_dict\": model.state_dict(),\n",
+    "    \"optimizer_state_dict\": optimizer.state_dict(),\n",
+    "    }, \n",
+    "    \"model_and_optimizer.pth\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 73,
+   "id": "8a0c7295-c822-43bf-9286-c45abc542868",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "checkpoint = torch.load(\"model_and_optimizer.pth\")\n",
+    "\n",
+    "model = GPTModel(GPT_CONFIG_124M)\n",
+    "model.load_state_dict(checkpoint[\"model_state_dict\"])\n",
+    "\n",
+    "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, weight_decay=0.1)\n",
+    "optimizer.load_state_dict(checkpoint[\"optimizer_state_dict\"])\n",
+    "model.train();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4194350e-0409-4a63-8ffd-d3a896509032",
+   "metadata": {},
+   "source": [
+    "## 5.5 从Open AI加载预训练权重"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83eb6c38-7278-40e0-bd9f-8a2b1feac3ec",
+   "metadata": {},
+   "source": [
+    "- 之前，我们仅出于教育目的使用一本非常小的短篇小说书训练了一个小型的GPT-2模型。\n",
+    "- 感兴趣的读者还可以在[../03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg)中找到在完整的古登堡计划书库上进行更长时间预训练的信息。\n",
+    "- 幸运的是，我们不需要花费数万到数十万美元在大型预训练语料库上预训练模型，而是可以加载由OpenAI提供的预训练权重。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "127ddbdb-3878-4669-9a39-d231fbdfb834",
+   "metadata": {},
+   "source": [
+    "- 有关从Hugging Face Hub加载权重的替代方法，请参见[../02_alternative_weight_loading](../02_alternative_weight_loading)。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75cab892-a165-4f43-9601-f517bc212ab6",
+   "metadata": {},
+   "source": [
+    "- 首先，一些模板代码用于从OpenAI下载文件并将权重加载到Python中。\n",
+    "- 由于OpenAI使用了[TensorFlow](https://www.tensorflow.org/)，我们将不得不安装并使用TensorFlow来加载权重；[tqdm](https://github.com/tqdm/tqdm) 是一个进度条库。\n",
+    "- 取消注释并运行下一个单元格以安装所需的库。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "id": "fb9fdf02-972a-444e-bf65-8ffcaaf30ce8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pip install tensorflow tqdm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "id": "a0747edc-559c-44ef-a93f-079d60227e3f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "TensorFlow version: 2.15.0\n",
+      "tqdm version: 4.66.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"TensorFlow version:\", version(\"tensorflow\"))\n",
+    "print(\"tqdm version:\", version(\"tqdm\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "id": "c5bc89eb-4d39-4287-9b0c-e459ebe7f5ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Relative import from the gpt_download.py contained in this folder\n",
+    "from gpt_download import download_and_load_gpt2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff76a736-6f9f-4328-872e-f89a7b70a2cc",
+   "metadata": {},
+   "source": [
+    "- 然后我们可以按照以下方式下载具有1.24亿参数的模型权重："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "id": "76271dd7-108d-4f5b-9c01-6ae0aac4b395",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "checkpoint: 100%|████████████████████████████| 77.0/77.0 [00:00<00:00, 132kiB/s]\n",
+      "encoder.json: 100%|███████████████████████| 1.04M/1.04M [00:00<00:00, 3.54MiB/s]\n",
+      "hparams.json: 100%|█████████████████████████| 90.0/90.0 [00:00<00:00, 52.9kiB/s]\n",
+      "model.ckpt.data-00000-of-00001: 100%|███████| 498M/498M [01:02<00:00, 7.93MiB/s]\n",
+      "model.ckpt.index: 100%|███████████████████| 5.21k/5.21k [00:00<00:00, 1.48MiB/s]\n",
+      "model.ckpt.meta: 100%|██████████████████████| 471k/471k [00:00<00:00, 1.93MiB/s]\n",
+      "vocab.bpe: 100%|████████████████████████████| 456k/456k [00:00<00:00, 2.30MiB/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "hparams, params = download_and_load_gpt2(model_size=\"124M\", models_dir=\"gpt2\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "id": "b1a31951-d971-4a6e-9c43-11ee1168ec6a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Settings: {'n_vocab': 50257, 'n_ctx': 1024, 'n_embd': 768, 'n_head': 12, 'n_layer': 12}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Settings:\", hparams)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 87,
+   "id": "857c8331-130e-46ba-921d-fa35d7a73cfe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Parameter dictionary keys: dict_keys(['blocks', 'b', 'g', 'wpe', 'wte'])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Parameter dictionary keys:\", params.keys())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "466e100c-294e-4afc-a70a-2f398ac4c104",
+   "metadata": {},
+   "source": [
+    "- 另外，\"355M\"、\"774M\" 和 \"1558M\" 也是支持的 `model_size` 参数。\n",
+    "- 这些不同大小的模型之间的差异在下面的图表中进行了总结："
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20f19d32-5aae-4176-9f86-f391672c8f0d",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch05_compressed/gpt-sizes.webp?timestamp=123\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea6e5076-f08d-41fc-bd8b-1cfe53538f41",
+   "metadata": {},
+   "source": [
+    "- 上面，我们将124M GPT-2模型权重加载到了Python中，但我们仍需要将它们转移到我们的`GPTModel`实例中。\n",
+    "- 首先，我们初始化一个新的GPTModel实例。\n",
+    "- 请注意，原始的GPT模型在多头注意力模块的查询、键和值矩阵的线性层中使用了带偏置向量的初始化，这是不必要的，也不推荐；然而，为了能够正确加载权重，我们也必须在我们的实现中通过设置`qkv_bias`为`True`来启用这些。\n",
+    "- 我们还使用了原始GPT-2模型使用的`1024`标记上下文长度。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "id": "9fef90dd-0654-4667-844f-08e28339ef7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define model configurations in a dictionary for compactness\n",
+    "model_configs = {\n",
+    "    \"gpt2-small\": {\"emb_dim\": 768, \"n_layers\": 12, \"n_heads\": 12},\n",
+    "    \"gpt2-medium\": {\"emb_dim\": 1024, \"n_layers\": 24, \"n_heads\": 16},\n",
+    "    \"gpt2-large\": {\"emb_dim\": 1280, \"n_layers\": 36, \"n_heads\": 20},\n",
+    "    \"gpt2-xl\": {\"emb_dim\": 1600, \"n_layers\": 48, \"n_heads\": 25},\n",
+    "}\n",
+    "\n",
+    "# Copy the base configuration and update with specific model settings\n",
+    "model_name = \"gpt2-small\"  # Example model name\n",
+    "NEW_CONFIG = GPT_CONFIG_124M.copy()\n",
+    "NEW_CONFIG.update(model_configs[model_name])\n",
+    "NEW_CONFIG.update({\"ctx_len\": 1024, \"qkv_bias\": True})\n",
+    "\n",
+    "gpt = GPTModel(NEW_CONFIG)\n",
+    "gpt.eval();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "272f29ac-8342-4b3d-a57d-9b0166ced314",
+   "metadata": {},
+   "source": [
+    "- 下一个任务是将OpenAI的权重分配给我们的`GPTModel`实例中相应的权重张量。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "id": "f9a92229-c002-49a6-8cfb-248297ad8296",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def assign(left, right):\n",
+    "    if left.shape != right.shape:\n",
+    "        raise ValueError(f\"Shape mismatch. Left: {left.shape}, Right: {right.shape}\")\n",
+    "    return torch.nn.Parameter(torch.tensor(right))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "id": "f22d5d95-ca5a-425c-a9ec-fc432a12d4e9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def load_weights_into_gpt(gpt, params):\n",
+    "    # Weight tying\n",
+    "    gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])\n",
+    "    gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])\n",
+    "    \n",
+    "    for b in range(len(params[\"blocks\"])):\n",
+    "        q_w, k_w, v_w = np.split((params[\"blocks\"][b][\"attn\"][\"c_attn\"])[\"w\"], 3, axis=-1)\n",
+    "        gpt.trf_blocks[b].att.W_query.weight = assign(gpt.trf_blocks[b].att.W_query.weight, q_w.T)\n",
+    "        gpt.trf_blocks[b].att.W_key.weight = assign(gpt.trf_blocks[b].att.W_key.weight, k_w.T)\n",
+    "        gpt.trf_blocks[b].att.W_value.weight = assign(gpt.trf_blocks[b].att.W_value.weight, v_w.T)\n",
+    "    \n",
+    "        q_b, k_b, v_b = np.split((params[\"blocks\"][b][\"attn\"][\"c_attn\"])[\"b\"], 3, axis=-1)\n",
+    "        gpt.trf_blocks[b].att.W_query.bias = assign(gpt.trf_blocks[b].att.W_query.bias, q_b)\n",
+    "        gpt.trf_blocks[b].att.W_key.bias = assign(gpt.trf_blocks[b].att.W_key.bias, k_b)\n",
+    "        gpt.trf_blocks[b].att.W_value.bias = assign(gpt.trf_blocks[b].att.W_value.bias, v_b)\n",
+    "    \n",
+    "        gpt.trf_blocks[b].att.out_proj.weight = assign(gpt.trf_blocks[b].att.out_proj.weight, params[\"blocks\"][b][\"attn\"][\"c_proj\"][\"w\"].T)\n",
+    "        gpt.trf_blocks[b].att.out_proj.bias = assign(gpt.trf_blocks[b].att.out_proj.bias, params[\"blocks\"][b][\"attn\"][\"c_proj\"][\"b\"])\n",
+    "    \n",
+    "        gpt.trf_blocks[b].ff.layers[0].weight = assign(gpt.trf_blocks[b].ff.layers[0].weight, params[\"blocks\"][b][\"mlp\"][\"c_fc\"][\"w\"].T)\n",
+    "        gpt.trf_blocks[b].ff.layers[0].bias = assign(gpt.trf_blocks[b].ff.layers[0].bias, params[\"blocks\"][b][\"mlp\"][\"c_fc\"][\"b\"])\n",
+    "        gpt.trf_blocks[b].ff.layers[2].weight = assign(gpt.trf_blocks[b].ff.layers[2].weight, params[\"blocks\"][b][\"mlp\"][\"c_proj\"][\"w\"].T)\n",
+    "        gpt.trf_blocks[b].ff.layers[2].bias = assign(gpt.trf_blocks[b].ff.layers[2].bias, params[\"blocks\"][b][\"mlp\"][\"c_proj\"][\"b\"])\n",
+    "    \n",
+    "        gpt.trf_blocks[b].norm1.scale = assign(gpt.trf_blocks[b].norm1.scale, params[\"blocks\"][b][\"ln_1\"][\"g\"])\n",
+    "        gpt.trf_blocks[b].norm1.shift = assign(gpt.trf_blocks[b].norm1.shift, params[\"blocks\"][b][\"ln_1\"][\"b\"])\n",
+    "        gpt.trf_blocks[b].norm2.scale = assign(gpt.trf_blocks[b].norm2.scale, params[\"blocks\"][b][\"ln_2\"][\"g\"])\n",
+    "        gpt.trf_blocks[b].norm2.shift = assign(gpt.trf_blocks[b].norm2.shift, params[\"blocks\"][b][\"ln_2\"][\"b\"])\n",
+    "    \n",
+    "        gpt.final_norm.scale = assign(gpt.final_norm.scale, params[\"g\"])\n",
+    "        gpt.final_norm.shift = assign(gpt.final_norm.shift, params[\"b\"])\n",
+    "        gpt.out_head.weight = assign(gpt.out_head.weight, params[\"wte\"])\n",
+    "    \n",
+    "    \n",
+    "load_weights_into_gpt(gpt, params)\n",
+    "gpt.to(device);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f7472cb-54dc-4311-96d8-b2694f885cee",
+   "metadata": {},
+   "source": [
+    "- 如果模型加载正确，我们可以使用它结合我们之前的`generate`函数来生成新文本："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "id": "1f690253-f845-4347-b7b6-43fabbd2affa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Output text:\n",
+      " Every effort moves you toward finding an ideal new way to practice something!\n",
+      "\n",
+      "What makes us want to be on top of that?\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "torch.manual_seed(123)\n",
+    "\n",
+    "token_ids = generate(\n",
+    "    model=gpt,\n",
+    "    idx=text_to_token_ids(\"Every effort moves you\", tokenizer),\n",
+    "    max_new_tokens=25,\n",
+    "    context_size=NEW_CONFIG[\"ctx_len\"],\n",
+    "    top_k=50,\n",
+    "    temperature=1.5\n",
+    ")\n",
+    "\n",
+    "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d079f98-a7c4-462e-8416-5a64f670861c",
+   "metadata": {},
+   "source": [
+    "- 我们知道模型权重加载正确，因为模型能够生成连贯的文本；如果我们犯了哪怕很小的错误，模型也无法做到这一点。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28493b9b-a1ae-4f31-87bc-c10ee4447f44",
+   "metadata": {},
+   "source": [
+    "- 有关从Hugging Face Hub加载权重的替代方法，请参考[../02_alternative_weight_loading](../02_alternative_weight_loading)。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2a66474-230d-4180-a8ff-843e04f1f1c4",
+   "metadata": {},
+   "source": [
+    "## 总结和要点"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc7ed189-a633-458c-bf12-4f70b42684b8",
+   "metadata": {},
+   "source": [
+    "- 查看包含独立训练脚本的[gpt_train.py](gpt_train.py)脚本。\n",
+    "- [gpt_generate.py](gpt_generate.py)脚本从OpenAI加载预训练权重，并根据提示生成文本。\n",
+    "- 你可以在[exercise-solutions.ipynb](exercise-solutions.ipynb)中找到练习题的解答。"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "A100",
+   "machine_shape": "hm",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/ch05/01_main-chapter-code/gpt_download.py b/ch05/01_main-chapter-code/gpt_download.py
new file mode 100644
index 0000000..89f2bc7
--- /dev/null
+++ b/ch05/01_main-chapter-code/gpt_download.py
@@ -0,0 +1,93 @@
+import os
+import requests
+import json
+import numpy as np
+import tensorflow as tf
+from tqdm import tqdm
+
+
+def download_and_load_gpt2(model_size, models_dir):
+    # Validate model size
+    allowed_sizes = ("124M", "355M", "774M", "1558M")
+    if model_size not in allowed_sizes:
+        raise ValueError(f"Model size not in {allowed_sizes}")
+
+    # Define paths
+    model_dir = os.path.join(models_dir, model_size)
+    base_url = "https://openaipublic.blob.core.windows.net/gpt-2/models"
+    filenames = [
+        "checkpoint", "encoder.json", "hparams.json",
+        "model.ckpt.data-00000-of-00001", "model.ckpt.index",
+        "model.ckpt.meta", "vocab.bpe"
+    ]
+
+    # Download files
+    os.makedirs(model_dir, exist_ok=True)
+    for filename in filenames:
+        file_url = os.path.join(base_url, model_size, filename)
+        file_path = os.path.join(model_dir, filename)
+        download_file(file_url, file_path)
+
+    # Load hparams and params
+    tf_ckpt_path = tf.train.latest_checkpoint(model_dir)
+    hparams = json.load(open(os.path.join(model_dir, "hparams.json")))
+    params = load_gpt2_params_from_tf_ckpt(tf_ckpt_path, hparams)
+
+    return hparams, params
+
+
+def download_file(url, destination):
+    # Send a GET request to download the file in streaming mode
+    response = requests.get(url, stream=True)
+
+    # Get the total file size from headers, defaulting to 0 if not present
+    file_size = int(response.headers.get("content-length", 0))
+
+    # Check if file exists and has the same size
+    if os.path.exists(destination):
+        file_size_local = os.path.getsize(destination)
+        if file_size == file_size_local:
+            print(f"File already exists and is up-to-date: {destination}")
+            return
+
+    # Define the block size for reading the file
+    block_size = 1024  # 1 Kilobyte
+
+    # Initialize the progress bar with total file size
+    progress_bar_description = url.split("/")[-1]  # Extract filename from URL
+    with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar:
+        # Open the destination file in binary write mode
+        with open(destination, "wb") as file:
+            # Iterate over the file data in chunks
+            for chunk in response.iter_content(block_size):
+                progress_bar.update(len(chunk))  # Update progress bar
+                file.write(chunk)  # Write the chunk to the file
+
+
+def load_gpt2_params_from_tf_ckpt(ckpt_path, hparams):
+    # Initialize parameters dictionary with empty blocks for each layer
+    params = {"blocks": [{} for _ in range(hparams["n_layer"])]}
+
+    # Iterate over each variable in the checkpoint
+    for name, _ in tf.train.list_variables(ckpt_path):
+        # Load the variable and remove singleton dimensions
+        variable_array = np.squeeze(tf.train.load_variable(ckpt_path, name))
+
+        # Process the variable name to extract relevant parts
+        variable_name_parts = name.split("/")[1:]  # Skip the 'model/' prefix
+
+        # Identify the target dictionary for the variable
+        target_dict = params
+        if variable_name_parts[0].startswith("h"):
+            layer_number = int(variable_name_parts[0][1:])
+            target_dict = params["blocks"][layer_number]
+
+        # Recursively access or create nested dictionaries
+        for key in variable_name_parts[1:-1]:
+            target_dict = target_dict.setdefault(key, {})
+
+        # Assign the variable array to the last key
+        last_key = variable_name_parts[-1]
+        target_dict[last_key] = variable_array
+
+    return params
diff --git a/ch05/01_main-chapter-code/gpt_generate.py b/ch05/01_main-chapter-code/gpt_generate.py
new file mode 100644
index 0000000..8adb728
--- /dev/null
+++ b/ch05/01_main-chapter-code/gpt_generate.py
@@ -0,0 +1,248 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+import json
+import numpy as np
+import os
+import requests
+import tensorflow as tf
+import tiktoken
+import torch
+from tqdm import tqdm
+
+# Import from local files
+from previous_chapters import GPTModel
+
+
+def text_to_token_ids(text, tokenizer):
+    encoded = tokenizer.encode(text)
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)  # add batch dimension
+    return encoded_tensor
+
+
+def token_ids_to_text(token_ids, tokenizer):
+    flat = token_ids.squeeze(0)  # remove batch dimension
+    return tokenizer.decode(flat.tolist())
+
+
+def download_and_load_gpt2(model_size, models_dir):
+    # Validate model size
+    allowed_sizes = ("124M", "355M", "774M", "1558M")
+    if model_size not in allowed_sizes:
+        raise ValueError(f"Model size not in {allowed_sizes}")
+
+    # Define paths
+    model_dir = os.path.join(models_dir, model_size)
+    base_url = "https://openaipublic.blob.core.windows.net/gpt-2/models"
+    filenames = [
+        "checkpoint", "encoder.json", "hparams.json",
+        "model.ckpt.data-00000-of-00001", "model.ckpt.index",
+        "model.ckpt.meta", "vocab.bpe"
+    ]
+
+    # Download files
+    os.makedirs(model_dir, exist_ok=True)
+    for filename in filenames:
+        file_url = os.path.join(base_url, model_size, filename)
+        file_path = os.path.join(model_dir, filename)
+        download_file(file_url, file_path)
+
+    # Load hparams and params
+    tf_ckpt_path = tf.train.latest_checkpoint(model_dir)
+    hparams = json.load(open(os.path.join(model_dir, "hparams.json")))
+    params = load_gpt2_params_from_tf_ckpt(tf_ckpt_path, hparams)
+
+    return hparams, params
+
+
+def download_file(url, destination):
+    # Send a GET request to download the file in streaming mode
+    response = requests.get(url, stream=True)
+
+    # Get the total file size from headers, defaulting to 0 if not present
+    file_size = int(response.headers.get("content-length", 0))
+
+    # Check if file exists and has the same size
+    if os.path.exists(destination):
+        file_size_local = os.path.getsize(destination)
+        if file_size == file_size_local:
+            print(f"File already exists and is up-to-date: {destination}")
+            return
+
+    # Define the block size for reading the file
+    block_size = 1024  # 1 Kilobyte
+
+    # Initialize the progress bar with total file size
+    progress_bar_description = url.split("/")[-1]  # Extract filename from URL
+    with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar:
+        # Open the destination file in binary write mode
+        with open(destination, "wb") as file:
+            # Iterate over the file data in chunks
+            for chunk in response.iter_content(block_size):
+                progress_bar.update(len(chunk))  # Update progress bar
+                file.write(chunk)  # Write the chunk to the file
+
+
+def load_gpt2_params_from_tf_ckpt(ckpt_path, hparams):
+    # Initialize parameters dictionary with empty blocks for each layer
+    params = {"blocks": [{} for _ in range(hparams["n_layer"])]}
+
+    # Iterate over each variable in the checkpoint
+    for name, _ in tf.train.list_variables(ckpt_path):
+        # Load the variable and remove singleton dimensions
+        variable_array = np.squeeze(tf.train.load_variable(ckpt_path, name))
+
+        # Process the variable name to extract relevant parts
+        variable_name_parts = name.split("/")[1:]  # Skip the 'model/' prefix
+
+        # Identify the target dictionary for the variable
+        target_dict = params
+        if variable_name_parts[0].startswith("h"):
+            layer_number = int(variable_name_parts[0][1:])
+            target_dict = params["blocks"][layer_number]
+
+        # Recursively access or create nested dictionaries
+        for key in variable_name_parts[1:-1]:
+            target_dict = target_dict.setdefault(key, {})
+
+        # Assign the variable array to the last key
+        last_key = variable_name_parts[-1]
+        target_dict[last_key] = variable_array
+
+    return params
+
+
+def assign(left, right):
+    if left.shape != right.shape:
+        raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
+    return torch.nn.Parameter(torch.tensor(right))
+
+
+def load_weights_into_gpt(gpt, params):
+    # Weight tying
+    gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])
+    gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])
+
+    for b in range(len(params["blocks"])):
+        q_w, k_w, v_w = np.split((params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1)
+        gpt.trf_blocks[b].att.W_query.weight = assign(gpt.trf_blocks[b].att.W_query.weight, q_w.T)
+        gpt.trf_blocks[b].att.W_key.weight = assign(gpt.trf_blocks[b].att.W_key.weight, k_w.T)
+        gpt.trf_blocks[b].att.W_value.weight = assign(gpt.trf_blocks[b].att.W_value.weight, v_w.T)
+
+        q_b, k_b, v_b = np.split((params["blocks"][b]["attn"]["c_attn"])["b"], 3, axis=-1)
+        gpt.trf_blocks[b].att.W_query.bias = assign(gpt.trf_blocks[b].att.W_query.bias, q_b)
+        gpt.trf_blocks[b].att.W_key.bias = assign(gpt.trf_blocks[b].att.W_key.bias, k_b)
+        gpt.trf_blocks[b].att.W_value.bias = assign(gpt.trf_blocks[b].att.W_value.bias, v_b)
+
+        gpt.trf_blocks[b].att.out_proj.weight = assign(gpt.trf_blocks[b].att.out_proj.weight, params["blocks"][b]["attn"]["c_proj"]["w"].T)
+        gpt.trf_blocks[b].att.out_proj.bias = assign(gpt.trf_blocks[b].att.out_proj.bias, params["blocks"][b]["attn"]["c_proj"]["b"])
+
+        gpt.trf_blocks[b].ff.layers[0].weight = assign(gpt.trf_blocks[b].ff.layers[0].weight, params["blocks"][b]["mlp"]["c_fc"]["w"].T)
+        gpt.trf_blocks[b].ff.layers[0].bias = assign(gpt.trf_blocks[b].ff.layers[0].bias, params["blocks"][b]["mlp"]["c_fc"]["b"])
+        gpt.trf_blocks[b].ff.layers[2].weight = assign(gpt.trf_blocks[b].ff.layers[2].weight, params["blocks"][b]["mlp"]["c_proj"]["w"].T)
+        gpt.trf_blocks[b].ff.layers[2].bias = assign(gpt.trf_blocks[b].ff.layers[2].bias, params["blocks"][b]["mlp"]["c_proj"]["b"])
+
+        gpt.trf_blocks[b].norm1.scale = assign(gpt.trf_blocks[b].norm1.scale, params["blocks"][b]["ln_1"]["g"])
+        gpt.trf_blocks[b].norm1.shift = assign(gpt.trf_blocks[b].norm1.shift, params["blocks"][b]["ln_1"]["b"])
+        gpt.trf_blocks[b].norm2.scale = assign(gpt.trf_blocks[b].norm2.scale, params["blocks"][b]["ln_2"]["g"])
+        gpt.trf_blocks[b].norm2.shift = assign(gpt.trf_blocks[b].norm2.shift, params["blocks"][b]["ln_2"]["b"])
+
+        gpt.final_norm.scale = assign(gpt.final_norm.scale, params["g"])
+        gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"])
+        gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"])
+
+
+def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):
+
+    # For-loop is the same as before: Get logits, and only focus on last time step
+    for _ in range(max_new_tokens):
+        idx_cond = idx[:, -context_size:]
+        with torch.no_grad():
+            logits = model(idx_cond)
+        logits = logits[:, -1, :]
+
+        # New: Filter logits with top_k sampling
+        if top_k is not None:
+            # Keep only top_k values
+            top_logits, _ = torch.topk(logits, top_k)
+            min_val = top_logits[:, -1]
+            logits = torch.where(logits < min_val, torch.tensor(float('-inf')).to(logits.device), logits)
+
+        # New: Apply temperature scaling
+        if temperature > 0.0:
+            logits = logits / temperature
+
+            # Apply softmax to get probabilities
+            probs = torch.softmax(logits, dim=-1)  # (batch_size, context_len)
+
+            # Sample from the distribution
+            idx_next = torch.multinomial(probs, num_samples=1)  # (batch_size, 1)
+
+        # Otherwise same as before: get idx of the vocab entry with the highest logits value
+        else:
+            idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch_size, 1)
+
+        # Same as before: append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch_size, num_tokens+1)
+
+    return idx
+
+
+def main(gpt_config, input_prompt, model_size):
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+    hparams, params = download_and_load_gpt2(model_size=model_size, models_dir="gpt2")
+
+    gpt = GPTModel(gpt_config)
+    load_weights_into_gpt(gpt, params)
+    gpt.to(device)
+    gpt.eval()
+
+    tokenizer = tiktoken.get_encoding("gpt2")
+
+    token_ids = generate(
+        model=gpt,
+        idx=text_to_token_ids(input_prompt, tokenizer),
+        max_new_tokens=30,
+        context_size=gpt_config["ctx_len"],
+        top_k=1,
+        temperature=1.0
+    )
+
+    print("Output text:\n", token_ids_to_text(token_ids, tokenizer))
+
+
+if __name__ == "__main__":
+
+    torch.manual_seed(123)
+
+    CHOOSE_MODEL = "gpt2-small"
+    INPUT_PROMPT = "Every effort moves"
+
+    BASE_CONFIG = {
+        "vocab_size": 50257,  # Vocabulary size
+        "ctx_len": 1024,      # Context length
+        "drop_rate": 0.0,     # Dropout rate
+        "qkv_bias": True      # Query-key-value bias
+    }
+
+    model_configs = {
+        "gpt2-small": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
+        "gpt2-medium": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
+        "gpt2-large": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
+        "gpt2-xl": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
+    }
+
+    model_sizes = {
+        "gpt2-small": "124M",
+        "gpt2-medium": "355M",
+        "gpt2-large": "774M",
+        "gpt2-xl": "1558"
+    }
+
+    BASE_CONFIG.update(model_configs[CHOOSE_MODEL])
+
+    main(BASE_CONFIG, INPUT_PROMPT, model_sizes[CHOOSE_MODEL])
diff --git a/ch05/01_main-chapter-code/gpt_train.py b/ch05/01_main-chapter-code/gpt_train.py
new file mode 100644
index 0000000..0992ede
--- /dev/null
+++ b/ch05/01_main-chapter-code/gpt_train.py
@@ -0,0 +1,234 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+import matplotlib.pyplot as plt
+import os
+import torch
+import urllib.request
+
+# Import from local files
+from previous_chapters import GPTModel, create_dataloader_v1, generate_text_simple
+
+
+def text_to_token_ids(text, tokenizer):
+    encoded = tokenizer.encode(text)
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)  # add batch dimension
+    return encoded_tensor
+
+
+def token_ids_to_text(token_ids, tokenizer):
+    flat = token_ids.squeeze(0)  # remove batch dimension
+    return tokenizer.decode(flat.tolist())
+
+
+def calc_loss_batch(input_batch, target_batch, model, device):
+    input_batch, target_batch = input_batch.to(device), target_batch.to(device)
+    logits = model(input_batch)
+    loss = torch.nn.functional.cross_entropy(logits.flatten(0, 1), target_batch.flatten())
+    return loss
+
+
+def calc_loss_loader(data_loader, model, device, num_batches=None):
+    total_loss = 0.
+    if num_batches is None:
+        num_batches = len(data_loader)
+    else:
+        num_batches = min(num_batches, len(data_loader))
+    for i, (input_batch, target_batch) in enumerate(data_loader):
+        if i < num_batches:
+            loss = calc_loss_batch(input_batch, target_batch, model, device)
+            total_loss += loss.item()
+        else:
+            break
+    return total_loss / num_batches
+
+
+def evaluate_model(model, train_loader, val_loader, device, eval_iter):
+    model.eval()
+    with torch.no_grad():
+        train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter)
+        val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter)
+    model.train()
+    return train_loss, val_loss
+
+
+def generate_and_print_sample(model, tokenizer, device, start_context):
+    model.eval()
+    context_size = model.pos_emb.weight.shape[0]
+    encoded = text_to_token_ids(start_context, tokenizer).to(device)
+    with torch.no_grad():
+        token_ids = generate_text_simple(
+            model=model, idx=encoded,
+            max_new_tokens=50, context_size=context_size
+        )
+        decoded_text = token_ids_to_text(token_ids, tokenizer)
+        print(decoded_text.replace("\n", " "))  # Compact print format
+    model.train()
+
+
+def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs,
+                       eval_freq, eval_iter, start_context):
+    # Initialize lists to track losses and tokens seen
+    train_losses, val_losses, track_tokens_seen = [], [], []
+    tokens_seen = 0
+    global_step = -1
+
+    # Main training loop
+    for epoch in range(num_epochs):
+        model.train()  # Set model to training mode
+
+        for input_batch, target_batch in train_loader:
+            optimizer.zero_grad()  # Reset loss gradients from previous epoch
+            loss = calc_loss_batch(input_batch, target_batch, model, device)
+            loss.backward()  # Calculate loss gradients
+            optimizer.step()  # Update model weights using loss gradients
+            tokens_seen += input_batch.numel()
+            global_step += 1
+
+            # Optional evaluation step
+            if global_step % eval_freq == 0:
+                train_loss, val_loss = evaluate_model(
+                    model, train_loader, val_loader, device, eval_iter)
+                train_losses.append(train_loss)
+                val_losses.append(val_loss)
+                track_tokens_seen.append(tokens_seen)
+                print(f"Ep {epoch+1} (Step {global_step:06d}): "
+                      f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}")
+
+        # Print a sample text after each epoch
+        generate_and_print_sample(
+            model, train_loader.dataset.tokenizer, device, start_context
+        )
+
+    return train_losses, val_losses, track_tokens_seen
+
+
+def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
+    fig, ax1 = plt.subplots()
+
+    # Plot training and validation loss against epochs
+    ax1.plot(epochs_seen, train_losses, label="Training loss")
+    ax1.plot(epochs_seen, val_losses, linestyle="-.", label="Validation loss")
+    ax1.set_xlabel("Epochs")
+    ax1.set_ylabel("Loss")
+    ax1.legend(loc="upper right")
+
+    # Create a second x-axis for tokens seen
+    ax2 = ax1.twiny()  # Create a second x-axis that shares the same y-axis
+    ax2.plot(tokens_seen, train_losses, alpha=0)  # Invisible plot for aligning ticks
+    ax2.set_xlabel("Tokens seen")
+
+    fig.tight_layout()  # Adjust layout to make room
+    # plt.show()
+
+
+def main(gpt_config, hparams):
+
+    torch.manual_seed(123)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+    ##############################
+    # Download data if necessary
+    ##############################
+
+    file_path = "the-verdict.txt"
+    url = "https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt"
+
+    if not os.path.exists(file_path):
+        with urllib.request.urlopen(url) as response:
+            text_data = response.read().decode('utf-8')
+        with open(file_path, "w", encoding="utf-8") as file:
+            file.write(text_data)
+    else:
+        with open(file_path, "r", encoding="utf-8") as file:
+            text_data = file.read()
+
+    ##############################
+    # Initialize model
+    ##############################
+
+    model = GPTModel(gpt_config)
+    model.to(device)  # no assignment model = model.to(device) necessary for nn.Module classes
+    optimizer = torch.optim.AdamW(
+        model.parameters(), lr=hparams["learning_rate"], weight_decay=hparams["weight_decay"]
+    )
+
+    ##############################
+    # Set up dataloaders
+    ##############################
+
+    # Train/validation ratio
+    train_ratio = 0.90
+    split_idx = int(train_ratio * len(text_data))
+
+    train_loader = create_dataloader_v1(
+        text_data[:split_idx],
+        batch_size=hparams["batch_size"],
+        max_length=gpt_config["ctx_len"],
+        stride=gpt_config["ctx_len"],
+        drop_last=True,
+        shuffle=True
+    )
+
+    val_loader = create_dataloader_v1(
+        text_data[split_idx:],
+        batch_size=hparams["batch_size"],
+        max_length=gpt_config["ctx_len"],
+        stride=gpt_config["ctx_len"],
+        drop_last=False,
+        shuffle=False
+    )
+
+    ##############################
+    # Train model
+    ##############################
+
+    train_losses, val_losses, tokens_seen = train_model_simple(
+        model, train_loader, val_loader, optimizer, device,
+        num_epochs=hparams["num_epochs"], eval_freq=5, eval_iter=1,
+        start_context="Every effort moves you",
+    )
+
+    return train_losses, val_losses, tokens_seen, model
+
+
+if __name__ == "__main__":
+
+    GPT_CONFIG_124M = {
+        "vocab_size": 50257,  # Vocabulary size
+        "ctx_len": 256,       # Shortened context length (orig: 1024)
+        "emb_dim": 768,       # Embedding dimension
+        "n_heads": 12,        # Number of attention heads
+        "n_layers": 12,       # Number of layers
+        "drop_rate": 0.1,     # Dropout rate
+        "qkv_bias": False     # Query-key-value bias
+    }
+
+    OTHER_HPARAMS = {
+        "learning_rate": 5e-4,
+        "num_epochs": 10,
+        "batch_size": 2,
+        "weight_decay": 0.1
+    }
+
+    ###########################
+    # Initiate training
+    ###########################
+
+    train_losses, val_losses, tokens_seen, model = main(GPT_CONFIG_124M, OTHER_HPARAMS)
+
+    ###########################
+    # After training
+    ###########################
+
+    # Plot results
+    epochs_tensor = torch.linspace(0, OTHER_HPARAMS["num_epochs"], len(train_losses))
+    plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses)
+    plt.savefig("loss.pdf")
+
+    # Save and load model
+    torch.save(model.state_dict(), "model.pth")
+    model = GPTModel(GPT_CONFIG_124M)
+    model.load_state_dict(torch.load("model.pth"))
diff --git a/ch05/01_main-chapter-code/images/img-1.webp b/ch05/01_main-chapter-code/images/img-1.webp
new file mode 100644
index 0000000..ffd2483
Binary files /dev/null and b/ch05/01_main-chapter-code/images/img-1.webp differ
diff --git a/ch05/01_main-chapter-code/images/img-2.webp b/ch05/01_main-chapter-code/images/img-2.webp
new file mode 100644
index 0000000..ff8ac00
Binary files /dev/null and b/ch05/01_main-chapter-code/images/img-2.webp differ
diff --git a/ch05/01_main-chapter-code/images/img-3.webp b/ch05/01_main-chapter-code/images/img-3.webp
new file mode 100644
index 0000000..bfe8b10
Binary files /dev/null and b/ch05/01_main-chapter-code/images/img-3.webp differ
diff --git a/ch05/01_main-chapter-code/previous_chapters.py b/ch05/01_main-chapter-code/previous_chapters.py
new file mode 100644
index 0000000..996d0bb
--- /dev/null
+++ b/ch05/01_main-chapter-code/previous_chapters.py
@@ -0,0 +1,276 @@
+# This file collects all the relevant code that we covered thus far
+# throughout Chapters 2-4.
+# This file can be run as a standalone script.
+
+import tiktoken
+import torch
+import torch.nn as nn
+from torch.utils.data import Dataset, DataLoader
+
+#####################################
+# Chapter 2
+#####################################
+
+
+class GPTDatasetV1(Dataset):
+    def __init__(self, txt, tokenizer, max_length, stride):
+        self.tokenizer = tokenizer
+        self.input_ids = []
+        self.target_ids = []
+
+        # Tokenize the entire text
+        token_ids = tokenizer.encode(txt)
+
+        # Use a sliding window to chunk the book into overlapping sequences of max_length
+        for i in range(0, len(token_ids) - max_length, stride):
+            input_chunk = token_ids[i:i + max_length]
+            target_chunk = token_ids[i + 1: i + max_length + 1]
+            self.input_ids.append(torch.tensor(input_chunk))
+            self.target_ids.append(torch.tensor(target_chunk))
+
+    def __len__(self):
+        return len(self.input_ids)
+
+    def __getitem__(self, idx):
+        return self.input_ids[idx], self.target_ids[idx]
+
+
+def create_dataloader_v1(txt, batch_size=4, max_length=256,
+                         stride=128, shuffle=True, drop_last=True):
+    # Initialize the tokenizer
+    tokenizer = tiktoken.get_encoding("gpt2")
+
+    # Create dataset
+    dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)
+
+    # Create dataloader
+    dataloader = DataLoader(
+        dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
+
+    return dataloader
+
+
+#####################################
+# Chapter 3
+#####################################
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_in, d_out, block_size, dropout, num_heads, qkv_bias=False):
+        super().__init__()
+        assert d_out % num_heads == 0, "d_out must be divisible by n_heads"
+
+        self.d_out = d_out
+        self.num_heads = num_heads
+        self.head_dim = d_out // num_heads  # Reduce the projection dim to match desired output dim
+
+        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.out_proj = nn.Linear(d_out, d_out)  # Linear layer to combine head outputs
+        self.dropout = nn.Dropout(dropout)
+        self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1))
+
+    def forward(self, x):
+        b, num_tokens, d_in = x.shape
+
+        keys = self.W_key(x)  # Shape: (b, num_tokens, d_out)
+        queries = self.W_query(x)
+        values = self.W_value(x)
+
+        # We implicitly split the matrix by adding a `num_heads` dimension
+        # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim)
+        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
+        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
+        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
+
+        # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim)
+        keys = keys.transpose(1, 2)
+        queries = queries.transpose(1, 2)
+        values = values.transpose(1, 2)
+
+        # Compute scaled dot-product attention (aka self-attention) with a causal mask
+        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head
+
+        # Original mask truncated to the number of tokens and converted to boolean
+        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
+
+        # Use the mask to fill attention scores
+        attn_scores.masked_fill_(mask_bool, -torch.inf)
+
+        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
+        attn_weights = self.dropout(attn_weights)
+
+        # Shape: (b, num_tokens, num_heads, head_dim)
+        context_vec = (attn_weights @ values).transpose(1, 2)
+
+        # Combine heads, where self.d_out = self.num_heads * self.head_dim
+        context_vec = context_vec.reshape(b, num_tokens, self.d_out)
+        context_vec = self.out_proj(context_vec)  # optional projection
+
+        return context_vec
+
+
+#####################################
+# Chapter 4
+#####################################
+class LayerNorm(nn.Module):
+    def __init__(self, emb_dim):
+        super().__init__()
+        self.eps = 1e-5
+        self.scale = nn.Parameter(torch.ones(emb_dim))
+        self.shift = nn.Parameter(torch.zeros(emb_dim))
+
+    def forward(self, x):
+        mean = x.mean(dim=-1, keepdim=True)
+        var = x.var(dim=-1, keepdim=True, unbiased=False)
+        norm_x = (x - mean) / torch.sqrt(var + self.eps)
+        return self.scale * norm_x + self.shift
+
+
+class GELU(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return 0.5 * x * (1 + torch.tanh(
+            torch.sqrt(torch.tensor(2.0 / torch.pi)) *
+            (x + 0.044715 * torch.pow(x, 3))
+        ))
+
+
+class FeedForward(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]),
+            GELU(),
+            nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]),
+            nn.Dropout(cfg["drop_rate"])
+        )
+
+    def forward(self, x):
+        return self.layers(x)
+
+
+class TransformerBlock(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.att = MultiHeadAttention(
+            d_in=cfg["emb_dim"],
+            d_out=cfg["emb_dim"],
+            block_size=cfg["ctx_len"],
+            num_heads=cfg["n_heads"],
+            dropout=cfg["drop_rate"],
+            qkv_bias=cfg["qkv_bias"])
+        self.ff = FeedForward(cfg)
+        self.norm1 = LayerNorm(cfg["emb_dim"])
+        self.norm2 = LayerNorm(cfg["emb_dim"])
+        self.drop_resid = nn.Dropout(cfg["drop_rate"])
+
+    def forward(self, x):
+        # Shortcut connection for attention block
+        shortcut = x
+        x = self.norm1(x)
+        x = self.att(x)   # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        # Shortcut connection for feed-forward block
+        shortcut = x
+        x = self.norm2(x)
+        x = self.ff(x)
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        return x
+
+
+class GPTModel(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"])
+        self.pos_emb = nn.Embedding(cfg["ctx_len"], cfg["emb_dim"])
+        self.drop_emb = nn.Dropout(cfg["drop_rate"])
+
+        self.trf_blocks = nn.Sequential(
+            *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])
+
+        self.final_norm = LayerNorm(cfg["emb_dim"])
+        self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False)
+
+    def forward(self, in_idx):
+        batch_size, seq_len = in_idx.shape
+        tok_embeds = self.tok_emb(in_idx)
+        pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device))
+        x = tok_embeds + pos_embeds  # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_emb(x)
+        x = self.trf_blocks(x)
+        x = self.final_norm(x)
+        logits = self.out_head(x)
+        return logits
+
+
+def generate_text_simple(model, idx, max_new_tokens, context_size):
+    # idx is (B, T) array of indices in the current context
+    for _ in range(max_new_tokens):
+
+        # Crop current context if it exceeds the supported context size
+        # E.g., if LLM supports only 5 tokens, and the context size is 10
+        # then only the last 5 tokens are used as context
+        idx_cond = idx[:, -context_size:]
+
+        # Get the predictions
+        with torch.no_grad():
+            logits = model(idx_cond)
+
+        # Focus only on the last time step
+        # (batch, n_token, vocab_size) becomes (batch, vocab_size)
+        logits = logits[:, -1, :]
+
+        # Get the idx of the vocab entry with the highest logits value
+        idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch, 1)
+
+        # Append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch, n_tokens+1)
+
+    return idx
+
+
+if __name__ == "__main__":
+
+    GPT_CONFIG_124M = {
+        "vocab_size": 50257,  # Vocabulary size
+        "ctx_len": 1024,      # Context length
+        "emb_dim": 768,       # Embedding dimension
+        "n_heads": 12,        # Number of attention heads
+        "n_layers": 12,       # Number of layers
+        "drop_rate": 0.1,     # Dropout rate
+        "qkv_bias": False     # Query-Key-Value bias
+    }
+
+    torch.manual_seed(123)
+    model = GPTModel(GPT_CONFIG_124M)
+    model.eval()  # disable dropout
+
+    start_context = "Hello, I am"
+
+    tokenizer = tiktoken.get_encoding("gpt2")
+    encoded = tokenizer.encode(start_context)
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)
+
+    print(f"\n{50*'='}\n{22*' '}IN\n{50*'='}")
+    print("\nInput text:", start_context)
+    print("Encoded input text:", encoded)
+    print("encoded_tensor.shape:", encoded_tensor.shape)
+
+    out = generate_text_simple(
+        model=model,
+        idx=encoded_tensor,
+        max_new_tokens=10,
+        context_size=GPT_CONFIG_124M["ctx_len"]
+    )
+    decoded_text = tokenizer.decode(out.squeeze(0).tolist())
+
+    print(f"\n\n{50*'='}\n{22*' '}OUT\n{50*'='}")
+    print("\nOutput:", out)
+    print("Output length:", len(out[0]))
+    print("Output text:", decoded_text)
diff --git a/ch05/01_main-chapter-code/tests.py b/ch05/01_main-chapter-code/tests.py
new file mode 100644
index 0000000..e410169
--- /dev/null
+++ b/ch05/01_main-chapter-code/tests.py
@@ -0,0 +1,40 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+# File for internal use (unit tests)
+
+import pytest
+from gpt_train import main
+
+
+@pytest.fixture
+def gpt_config():
+    return {
+        "vocab_size": 50257,
+        "ctx_len": 12,      # small for testing efficiency
+        "emb_dim": 32,      # small for testing efficiency
+        "n_heads": 4,       # small for testing efficiency
+        "n_layers": 2,      # small for testing efficiency
+        "drop_rate": 0.1,
+        "qkv_bias": False
+    }
+
+
+@pytest.fixture
+def other_hparams():
+    return {
+        "learning_rate": 5e-4,
+        "num_epochs": 1,    # small for testing efficiency
+        "batch_size": 2,
+        "weight_decay": 0.1
+    }
+
+
+def test_main(gpt_config, other_hparams):
+    train_losses, val_losses, tokens_seen, model = main(gpt_config, other_hparams)
+
+    assert len(train_losses) == 39, "Unexpected number of training losses"
+    assert len(val_losses) == 39, "Unexpected number of validation losses"
+    assert len(tokens_seen) == 39, "Unexpected number of tokens seen"
diff --git a/ch05/02_alternative_weight_loading/README.md b/ch05/02_alternative_weight_loading/README.md
new file mode 100644
index 0000000..e00d020
--- /dev/null
+++ b/ch05/02_alternative_weight_loading/README.md
@@ -0,0 +1,5 @@
+# Alternative Weight Loading
+
+This folder contains alternative weight loading strategies in case the weights become unavailable from Open AI.
+
+- [weight-loading-hf-transformers.ipynb](weight-loading-hf-transformers.ipynb): contains code to load the weights from the Hugging Face Model Hub via the `transformers` library
\ No newline at end of file
diff --git a/ch05/02_alternative_weight_loading/previous_chapters.py b/ch05/02_alternative_weight_loading/previous_chapters.py
new file mode 100644
index 0000000..d772e86
--- /dev/null
+++ b/ch05/02_alternative_weight_loading/previous_chapters.py
@@ -0,0 +1,287 @@
+# This file collects all the relevant code that we covered thus far
+# throughout Chapters 2-4.
+# This file can be run as a standalone script.
+
+import tiktoken
+import torch
+import torch.nn as nn
+from torch.utils.data import Dataset, DataLoader
+
+#####################################
+# Chapter 2
+#####################################
+
+
+class GPTDatasetV1(Dataset):
+    def __init__(self, txt, tokenizer, max_length, stride):
+        self.tokenizer = tokenizer
+        self.input_ids = []
+        self.target_ids = []
+
+        # Tokenize the entire text
+        token_ids = tokenizer.encode(txt)
+
+        # Use a sliding window to chunk the book into overlapping sequences of max_length
+        for i in range(0, len(token_ids) - max_length, stride):
+            input_chunk = token_ids[i:i + max_length]
+            target_chunk = token_ids[i + 1: i + max_length + 1]
+            self.input_ids.append(torch.tensor(input_chunk))
+            self.target_ids.append(torch.tensor(target_chunk))
+
+    def __len__(self):
+        return len(self.input_ids)
+
+    def __getitem__(self, idx):
+        return self.input_ids[idx], self.target_ids[idx]
+
+
+def create_dataloader_v1(txt, batch_size=4, max_length=256,
+                         stride=128, shuffle=True, drop_last=True):
+    # Initialize the tokenizer
+    tokenizer = tiktoken.get_encoding("gpt2")
+
+    # Create dataset
+    dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)
+
+    # Create dataloader
+    dataloader = DataLoader(
+        dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
+
+    return dataloader
+
+
+#####################################
+# Chapter 3
+#####################################
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_in, d_out, block_size, dropout, num_heads, qkv_bias=False):
+        super().__init__()
+        assert d_out % num_heads == 0, "d_out must be divisible by n_heads"
+
+        self.d_out = d_out
+        self.num_heads = num_heads
+        self.head_dim = d_out // num_heads  # Reduce the projection dim to match desired output dim
+
+        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.out_proj = nn.Linear(d_out, d_out)  # Linear layer to combine head outputs
+        self.dropout = nn.Dropout(dropout)
+        self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1))
+
+    def forward(self, x):
+        b, num_tokens, d_in = x.shape
+
+        keys = self.W_key(x)  # Shape: (b, num_tokens, d_out)
+        queries = self.W_query(x)
+        values = self.W_value(x)
+
+        # We implicitly split the matrix by adding a `num_heads` dimension
+        # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim)
+        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
+        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
+        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
+
+        # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim)
+        keys = keys.transpose(1, 2)
+        queries = queries.transpose(1, 2)
+        values = values.transpose(1, 2)
+
+        # Compute scaled dot-product attention (aka self-attention) with a causal mask
+        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head
+
+        # Original mask truncated to the number of tokens and converted to boolean
+        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
+
+        # Use the mask to fill attention scores
+        attn_scores.masked_fill_(mask_bool, -torch.inf)
+
+        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
+        attn_weights = self.dropout(attn_weights)
+
+        # Shape: (b, num_tokens, num_heads, head_dim)
+        context_vec = (attn_weights @ values).transpose(1, 2)
+
+        # Combine heads, where self.d_out = self.num_heads * self.head_dim
+        context_vec = context_vec.reshape(b, num_tokens, self.d_out)
+        context_vec = self.out_proj(context_vec)  # optional projection
+
+        return context_vec
+
+
+#####################################
+# Chapter 4
+#####################################
+class LayerNorm(nn.Module):
+    def __init__(self, emb_dim):
+        super().__init__()
+        self.eps = 1e-5
+        self.scale = nn.Parameter(torch.ones(emb_dim))
+        self.shift = nn.Parameter(torch.zeros(emb_dim))
+
+    def forward(self, x):
+        mean = x.mean(dim=-1, keepdim=True)
+        var = x.var(dim=-1, keepdim=True, unbiased=False)
+        norm_x = (x - mean) / torch.sqrt(var + self.eps)
+        return self.scale * norm_x + self.shift
+
+
+class GELU(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return 0.5 * x * (1 + torch.tanh(
+            torch.sqrt(torch.tensor(2.0 / torch.pi)) *
+            (x + 0.044715 * torch.pow(x, 3))
+        ))
+
+
+class FeedForward(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]),
+            GELU(),
+            nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]),
+            nn.Dropout(cfg["drop_rate"])
+        )
+
+    def forward(self, x):
+        return self.layers(x)
+
+
+class TransformerBlock(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.att = MultiHeadAttention(
+            d_in=cfg["emb_dim"],
+            d_out=cfg["emb_dim"],
+            block_size=cfg["ctx_len"],
+            num_heads=cfg["n_heads"],
+            dropout=cfg["drop_rate"],
+            qkv_bias=cfg["qkv_bias"])
+        self.ff = FeedForward(cfg)
+        self.norm1 = LayerNorm(cfg["emb_dim"])
+        self.norm2 = LayerNorm(cfg["emb_dim"])
+        self.drop_resid = nn.Dropout(cfg["drop_rate"])
+
+    def forward(self, x):
+        # Shortcut connection for attention block
+        shortcut = x
+        x = self.norm1(x)
+        x = self.att(x)   # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        # Shortcut connection for feed-forward block
+        shortcut = x
+        x = self.norm2(x)
+        x = self.ff(x)
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        return x
+
+
+class GPTModel(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"])
+        self.pos_emb = nn.Embedding(cfg["ctx_len"], cfg["emb_dim"])
+        self.drop_emb = nn.Dropout(cfg["drop_rate"])
+
+        self.trf_blocks = nn.Sequential(
+            *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])
+
+        self.final_norm = LayerNorm(cfg["emb_dim"])
+        self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False)
+
+    def forward(self, in_idx):
+        batch_size, seq_len = in_idx.shape
+        tok_embeds = self.tok_emb(in_idx)
+        pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device))
+        x = tok_embeds + pos_embeds  # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_emb(x)
+        x = self.trf_blocks(x)
+        x = self.final_norm(x)
+        logits = self.out_head(x)
+        return logits
+
+
+def generate_text_simple(model, idx, max_new_tokens, context_size):
+    # idx is (B, T) array of indices in the current context
+    for _ in range(max_new_tokens):
+
+        # Crop current context if it exceeds the supported context size
+        # E.g., if LLM supports only 5 tokens, and the context size is 10
+        # then only the last 5 tokens are used as context
+        idx_cond = idx[:, -context_size:]
+
+        # Get the predictions
+        with torch.no_grad():
+            logits = model(idx_cond)
+
+        # Focus only on the last time step
+        # (batch, n_token, vocab_size) becomes (batch, vocab_size)
+        logits = logits[:, -1, :]
+
+        # Get the idx of the vocab entry with the highest logits value
+        idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch, 1)
+
+        # Append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch, n_tokens+1)
+
+    return idx
+
+
+#####################################
+# Chapter 5
+#####################################
+
+
+def text_to_token_ids(text, tokenizer):
+    encoded = tokenizer.encode(text)
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)  # add batch dimension
+    return encoded_tensor
+
+
+def token_ids_to_text(token_ids, tokenizer):
+    flat = token_ids.squeeze(0)  # remove batch dimension
+    return tokenizer.decode(flat.tolist())
+
+
+def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):
+
+    # For-loop is the same as before: Get logits, and only focus on last time step
+    for _ in range(max_new_tokens):
+        idx_cond = idx[:, -context_size:]
+        with torch.no_grad():
+            logits = model(idx_cond)
+        logits = logits[:, -1, :]
+
+        # New: Filter logits with top_k sampling
+        if top_k is not None:
+            # Keep only top_k values
+            top_logits, _ = torch.topk(logits, top_k)
+            min_val = top_logits[:, -1]
+            logits = torch.where(logits < min_val, torch.tensor(float('-inf')).to(logits.device), logits)
+
+        # New: Apply temperature scaling
+        if temperature > 0.0:
+            logits = logits / temperature
+
+            # Apply softmax to get probabilities
+            probs = torch.softmax(logits, dim=-1)  # (batch_size, context_len)
+
+            # Sample from the distribution
+            idx_next = torch.multinomial(probs, num_samples=1)  # (batch_size, 1)
+
+        # Otherwise same as before: get idx of the vocab entry with the highest logits value
+        else:
+            idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch_size, 1)
+
+        # Same as before: append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch_size, num_tokens+1)
+
+    return idx
diff --git a/ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb b/ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb
new file mode 100644
index 0000000..736c5bd
--- /dev/null
+++ b/ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb
@@ -0,0 +1,312 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6d6bc54f-2b16-4b0f-be69-957eed5d112f",
+   "metadata": {},
+   "source": [
+    "<font size=\"1\">\n",
+    "Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
+    "Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
+    "</font>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72953590-5363-4398-85ce-54bde07f3d8a",
+   "metadata": {},
+   "source": [
+    "# Bonus Code for Chapter 5"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a4ab5ee-e7b9-45d3-a82b-a12bcfc0945a",
+   "metadata": {},
+   "source": [
+    "## Alternative Weight Loading from Hugging Face Model Hub using Transformers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2feea87-49f0-48b9-b925-b8f0dda4096f",
+   "metadata": {},
+   "source": [
+    "- In the main chapter, we loaded the GPT model weights directly from OpenAI\n",
+    "- This notebook provides alternative weight loading code to load the model weights from the [Hugging Face Model Hub](https://huggingface.co/docs/hub/en/models-the-hub) using the `transformers` Python library"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "99b77109-5215-4d07-a618-4d10eff1a488",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pip install transformers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b0467eff-b43c-4a38-93e8-5ed87a5fc2b1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "numpy version: 1.25.2\n",
+      "torch version: 2.2.1\n",
+      "transformers version: 4.33.2\n"
+     ]
+    }
+   ],
+   "source": [
+    "from importlib.metadata import version\n",
+    "\n",
+    "pkgs = [\"numpy\", \"torch\", \"transformers\"]\n",
+    "for p in pkgs:\n",
+    "    print(f\"{p} version: {version(p)}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ffc17d7d-bcd8-42ee-82a9-04fd55acf15d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/sebastian/miniforge3/envs/book/lib/python3.11/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
+      "  torch.utils._pytree._register_pytree_node(\n",
+      "/Users/sebastian/miniforge3/envs/book/lib/python3.11/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
+      "  torch.utils._pytree._register_pytree_node(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "GPT2Model(\n",
+       "  (wte): Embedding(50257, 768)\n",
+       "  (wpe): Embedding(1024, 768)\n",
+       "  (drop): Dropout(p=0.1, inplace=False)\n",
+       "  (h): ModuleList(\n",
+       "    (0-11): 12 x GPT2Block(\n",
+       "      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n",
+       "      (attn): GPT2Attention(\n",
+       "        (c_attn): Conv1D()\n",
+       "        (c_proj): Conv1D()\n",
+       "        (attn_dropout): Dropout(p=0.1, inplace=False)\n",
+       "        (resid_dropout): Dropout(p=0.1, inplace=False)\n",
+       "      )\n",
+       "      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n",
+       "      (mlp): GPT2MLP(\n",
+       "        (c_fc): Conv1D()\n",
+       "        (c_proj): Conv1D()\n",
+       "        (act): NewGELUActivation()\n",
+       "        (dropout): Dropout(p=0.1, inplace=False)\n",
+       "      )\n",
+       "    )\n",
+       "  )\n",
+       "  (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n",
+       ")"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from transformers import GPT2Model\n",
+    "\n",
+    "\n",
+    "# allowed model names\n",
+    "model_names = {\n",
+    "    \"gpt2-small\": \"openai-community/gpt2\",         # 124M\n",
+    "    \"gpt2-medium\": \"openai-community/gpt2-medium\", # 355M\n",
+    "    \"gpt2-large\": \"openai-community/gpt2-large\",   # 774M\n",
+    "    \"gpt2-xl\": \"openai-community/gpt2-xl\"          # 1558M\n",
+    "}\n",
+    "\n",
+    "CHOOSE_MODEL = \"gpt2-small\"\n",
+    "\n",
+    "gpt_hf = GPT2Model.from_pretrained(model_names[CHOOSE_MODEL], cache_dir=\"checkpoints\")\n",
+    "gpt_hf.eval()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9ea9b1bc-7881-46ad-9555-27a9cf23faa7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "BASE_CONFIG = {\n",
+    "    \"vocab_size\": 50257,  # Vocabulary size\n",
+    "    \"ctx_len\": 1024,      # Context length\n",
+    "    \"drop_rate\": 0.0,     # Dropout rate\n",
+    "    \"qkv_bias\": True      # Query-key-value bias\n",
+    "}\n",
+    "\n",
+    "model_configs = {\n",
+    "    \"gpt2-small\": {\"emb_dim\": 768, \"n_layers\": 12, \"n_heads\": 12},\n",
+    "    \"gpt2-medium\": {\"emb_dim\": 1024, \"n_layers\": 24, \"n_heads\": 16},\n",
+    "    \"gpt2-large\": {\"emb_dim\": 1280, \"n_layers\": 36, \"n_heads\": 20},\n",
+    "    \"gpt2-xl\": {\"emb_dim\": 1600, \"n_layers\": 48, \"n_heads\": 25},\n",
+    "}\n",
+    "\n",
+    "\n",
+    "BASE_CONFIG.update(model_configs[CHOOSE_MODEL])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "4e2a4cf4-a54e-4307-9141-fb9f288e4dfa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def assign_check(left, right):\n",
+    "    if left.shape != right.shape:\n",
+    "        raise ValueError(f\"Shape mismatch. Left: {left.shape}, Right: {right.shape}\")\n",
+    "    return torch.nn.Parameter(torch.tensor(right))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "75be3077-f141-44bb-af88-62580ffd224c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "\n",
+    "def load_weights(gpt, gpt_hf):\n",
+    "\n",
+    "    d = gpt_hf.state_dict()\n",
+    "\n",
+    "    gpt.pos_emb.weight = assign_check(gpt.pos_emb.weight, d[\"wpe.weight\"])\n",
+    "    gpt.tok_emb.weight = assign_check(gpt.tok_emb.weight, d[\"wte.weight\"])\n",
+    "    \n",
+    "    for b in range(BASE_CONFIG[\"n_layers\"]):\n",
+    "        q_w, k_w, v_w = np.split(d[f\"h.{b}.attn.c_attn.weight\"], 3, axis=-1)\n",
+    "        gpt.trf_blocks[b].att.W_query.weight = assign_check(gpt.trf_blocks[b].att.W_query.weight, q_w.T)\n",
+    "        gpt.trf_blocks[b].att.W_key.weight = assign_check(gpt.trf_blocks[b].att.W_key.weight, k_w.T)\n",
+    "        gpt.trf_blocks[b].att.W_value.weight = assign_check(gpt.trf_blocks[b].att.W_value.weight, v_w.T)\n",
+    "    \n",
+    "        q_b, k_b, v_b = np.split(d[f\"h.{b}.attn.c_attn.bias\"], 3, axis=-1)\n",
+    "        gpt.trf_blocks[b].att.W_query.bias = assign_check(gpt.trf_blocks[b].att.W_query.bias, q_b)\n",
+    "        gpt.trf_blocks[b].att.W_key.bias = assign_check(gpt.trf_blocks[b].att.W_key.bias, k_b)\n",
+    "        gpt.trf_blocks[b].att.W_value.bias = assign_check(gpt.trf_blocks[b].att.W_value.bias, v_b)\n",
+    "    \n",
+    "    \n",
+    "        gpt.trf_blocks[b].att.out_proj.weight = assign_check(gpt.trf_blocks[b].att.out_proj.weight, d[f\"h.{b}.attn.c_proj.weight\"].T)\n",
+    "        gpt.trf_blocks[b].att.out_proj.bias = assign_check(gpt.trf_blocks[b].att.out_proj.bias, d[f\"h.{b}.attn.c_proj.bias\"])\n",
+    "    \n",
+    "        gpt.trf_blocks[b].ff.layers[0].weight = assign_check(gpt.trf_blocks[b].ff.layers[0].weight, d[f\"h.{b}.mlp.c_fc.weight\"].T)\n",
+    "        gpt.trf_blocks[b].ff.layers[0].bias = assign_check(gpt.trf_blocks[b].ff.layers[0].bias, d[f\"h.{b}.mlp.c_fc.bias\"])\n",
+    "        gpt.trf_blocks[b].ff.layers[2].weight = assign_check(gpt.trf_blocks[b].ff.layers[2].weight, d[f\"h.{b}.mlp.c_proj.weight\"].T)\n",
+    "        gpt.trf_blocks[b].ff.layers[2].bias = assign_check(gpt.trf_blocks[b].ff.layers[2].bias, d[f\"h.{b}.mlp.c_proj.bias\"])\n",
+    "    \n",
+    "        gpt.trf_blocks[b].norm1.scale = assign_check(gpt.trf_blocks[b].norm1.scale, d[f\"h.{b}.ln_1.weight\"])\n",
+    "        gpt.trf_blocks[b].norm1.shift = assign_check(gpt.trf_blocks[b].norm1.shift, d[f\"h.{b}.ln_1.bias\"])\n",
+    "        gpt.trf_blocks[b].norm2.scale = assign_check(gpt.trf_blocks[b].norm2.scale, d[f\"h.{b}.ln_2.weight\"])\n",
+    "        gpt.trf_blocks[b].norm2.shift = assign_check(gpt.trf_blocks[b].norm2.shift, d[f\"h.{b}.ln_2.bias\"])\n",
+    "    \n",
+    "        gpt.final_norm.scale = assign_check(gpt.final_norm.scale, d[f\"ln_f.weight\"])\n",
+    "        gpt.final_norm.shift = assign_check(gpt.final_norm.shift, d[f\"ln_f.bias\"])\n",
+    "        gpt.out_head.weight = assign_check(gpt.out_head.weight, d[\"wte.weight\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "cda44d37-92c0-4c19-a70a-15711513afce",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/var/folders/jg/tpqyh1fd5js5wsr1d138k3n40000gn/T/ipykernel_32618/3877979348.py:4: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
+      "  return torch.nn.Parameter(torch.tensor(right))\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from previous_chapters import GPTModel\n",
+    "\n",
+    "\n",
+    "gpt = GPTModel(BASE_CONFIG)\n",
+    "\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "load_weights(gpt, gpt_hf)\n",
+    "gpt.to(device);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "4ddd0d51-3ade-4890-9bab-d63f141d095f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Output text:\n",
+      " Every effort moves forward, but it's not enough.\n",
+      "\n",
+      "\"I'm not going to sit here and say, 'I'm not going to do this,'\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tiktoken\n",
+    "from previous_chapters import generate, text_to_token_ids, token_ids_to_text\n",
+    "\n",
+    "torch.manual_seed(123)\n",
+    "\n",
+    "tokenizer = tiktoken.get_encoding(\"gpt2\")\n",
+    "\n",
+    "token_ids = generate(\n",
+    "    model=gpt,\n",
+    "    idx=text_to_token_ids(\"Every effort moves\", tokenizer),\n",
+    "    max_new_tokens=30,\n",
+    "    context_size=BASE_CONFIG[\"ctx_len\"],\n",
+    "    top_k=1,\n",
+    "    temperature=1.0\n",
+    ")\n",
+    "\n",
+    "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/ch05/03_bonus_pretraining_on_gutenberg/README.md b/ch05/03_bonus_pretraining_on_gutenberg/README.md
new file mode 100644
index 0000000..b5cd141
--- /dev/null
+++ b/ch05/03_bonus_pretraining_on_gutenberg/README.md
@@ -0,0 +1,122 @@
+# Pretraining GPT on the Project Gutenberg Dataset
+
+The code in this directory contains code for training a small GPT model on the free books provided by Project Gutenberg.
+
+As the Project Gutenberg website states, "the vast majority of Project Gutenberg eBooks are in the public domain in the US." 
+
+Please read the [Project Gutenberg Permissions, Licensing and other Common Requests](https://www.gutenberg.org/policy/permission.html) page for more information about using the resources provided by Project Gutenberg. 
+
+&nbsp;
+## How to use this code
+
+&nbsp;
+
+### 1) Download the dataset
+
+As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then.
+
+Follow these steps to download the dataset:
+
+
+1. `git clone https://github.com/pgcorpus/gutenberg.git`
+
+2. `cd gutenberg`
+
+3. `pip install -r requirements.txt`
+
+4. `python get_data.py`
+
+5. `cd ..`
+
+&nbsp;
+### 2) Prepare the dataset
+
+Next, run the `prepare_dataset.py` script, which concatenates the (as of this writing, 60,173) text files into fewer larger files so that they can be more efficiently transferred and accessed:
+
+```
+prepare_dataset.py \
+  --data_dir "gutenberg/data" \
+  --max_size_mb 500 \
+  --output_dir "gutenberg_preprocessed"
+```
+
+> [!TIP] 
+> Note that the produced files are stored in plaintext format and are not pre-tokenized for simplicity. However, you may want to update the codes to store the dataset in a pre-tokenized form to save computation time if you are planning to use the dataset more often or train for multiple epochs. See the *Design Decisions and Improvements* at the bottom of this page for more information.
+
+> [!TIP]
+> You can choose smaller file sizes, for example, 50 MB. This will result in more files but might be useful for quicker pretraining runs on a small number of files for testing purposes.
+
+
+&nbsp;
+### 3) Run the pretraining script
+
+You can run the pretraining script as follows. Note that the additional command line arguments are shown with the default values for illustration purposes:
+
+```bash
+pretraining_simple.py \
+  --data_dir "gutenberg_preprocessed" \
+  --n_epochs 1 \
+  --batch_size 4 \
+  --output_dir model_checkpoints
+```
+
+The output will be formatted in the following way:
+
+```
+Total files: 3
+Tokenizing file 1 of 3: data_small/combined_1.txt
+Training ...
+Ep 1 (Step 0): Train loss 9.694, Val loss 9.724
+Ep 1 (Step 100): Train loss 6.672, Val loss 6.683
+Ep 1 (Step 200): Train loss 6.543, Val loss 6.434
+Ep 1 (Step 300): Train loss 5.772, Val loss 6.313
+Ep 1 (Step 400): Train loss 5.547, Val loss 6.249
+Ep 1 (Step 500): Train loss 6.182, Val loss 6.155
+Ep 1 (Step 600): Train loss 5.742, Val loss 6.122
+Ep 1 (Step 700): Train loss 6.309, Val loss 5.984
+Ep 1 (Step 800): Train loss 5.435, Val loss 5.975
+Ep 1 (Step 900): Train loss 5.582, Val loss 5.935
+...
+Ep 1 (Step 31900): Train loss 3.664, Val loss 3.946
+Ep 1 (Step 32000): Train loss 3.493, Val loss 3.939
+Ep 1 (Step 32100): Train loss 3.940, Val loss 3.961
+Saved model_checkpoints/model_pg_32188.pth
+Book processed 3h 46m 55s 
+Total time elapsed 3h 46m 55s 
+ETA for remaining books: 7h 33m 50s
+Tokenizing file 2 of 3: data_small/combined_2.txt
+Training ...
+Ep 1 (Step 32200): Train loss 2.982, Val loss 4.094
+Ep 1 (Step 32300): Train loss 3.920, Val loss 4.097
+...
+```
+
+
+&nbsp;
+> [!TIP] 
+> In practice, if you are using macOS or Linux, I recommend using the `tee` command to save the log outputs to a `log.txt` file in addition to printing them on the terminal:
+
+```bash
+python -u pretraining_simple.py | tee log.txt
+```
+
+&nbsp;
+> [!WARNING]  
+> Note that training on 1 of the ~500 Mb text files in the `gutenberg_preprocessed` folder will take approximately 4 hours on a V100 GPU. 
+> The folder contains 47 files and will take approximately 200 hours (more than 1 week) to complete. You may want to run it on a smaller number of files.
+
+
+&nbsp;
+## Design Decisions and Improvements
+
+Note that this code focuses on keeping things simple and minimal for educational purposes. The code could be improved in the following ways to improve modeling performance and training efficiency:
+
+1. Modify the `prepare_dataset.py` script to strip the Gutenberg boilerplate text from each book file.
+2. Update the data preparation and loading utilities to pre-tokenize the dataset and save it in a tokenized form so that it doesn't have to be re-tokenized each time when calling the pretraining script.
+3. Update the `train_model_simple` script by adding the features introduced in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb), namely, cosine decay, linear warmup, and gradient clipping.
+4. Update the pretraining script to save the optimizer state (see section *5.4 Loading and saving weights in PyTorch* in chapter 5; [ch05.ipynb](../../ch05/01_main-chapter-code/ch05.ipynb)) and add the option to load an existing model and optimizer checkpoint and continue training if the training run was interrupted.
+5. Add a more advanced logger (for example, Weights and Biases) to view the loss and validation curves live
+6. Add distributed data parallelism (DDP) and train the model on multiple GPUs (see section *A.9.3 Training with multiple GPUs* in appendix A; [DDP-script.py](../../appendix-A/03_main-chapter-code/DDP-script.py)).
+7. Swap the from scratch `MultiheadAttention` class in the `previous_chapter.py` script with the efficient `MHAPyTorchScaledDotProduct` class implemented in the [Efficient Multi-Head Attention Implementations](../../ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb) bonus section, which uses Flash Attention via PyTorch's `nn.functional.scaled_dot_product_attention` function.
+8. Speeding up the training by optimizing the model via [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) (`model = torch.compile`) or [thunder](https://github.com/Lightning-AI/lightning-thunder) (`model = thunder.jit(model)`).
+9. Implement Gradient Low-Rank Projection (GaLore) to further speed up the pretraining process. This can be achieved by just replacing the `AdamW` optimizer with the provided `GaLoreAdamW` provided in the [GaLore Python library](https://github.com/jiaweizzhao/GaLore).
\ No newline at end of file
diff --git a/ch05/03_bonus_pretraining_on_gutenberg/prepare_dataset.py b/ch05/03_bonus_pretraining_on_gutenberg/prepare_dataset.py
new file mode 100644
index 0000000..7ab5df4
--- /dev/null
+++ b/ch05/03_bonus_pretraining_on_gutenberg/prepare_dataset.py
@@ -0,0 +1,70 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+"""
+Script that processes the Project Gutenberg files into fewer larger files.
+"""
+
+import argparse
+import os
+
+
+def combine_files(file_paths, target_dir, max_size_mb=500, separator="<|endoftext|>", fallback_encoding="latin1"):
+    if not os.path.exists(target_dir):
+        os.makedirs(target_dir)
+
+    current_content = []
+    current_size = 0
+    file_counter = 1
+
+    for file_path in file_paths:
+        try:
+            with open(file_path, "r", encoding="utf-8") as file:
+                content = file.read()
+        except UnicodeDecodeError:
+            # Attempt to read the file with a fallback encoding
+            print(f"Warning: UnicodeDecodeError encountered. Trying fallback encoding for {file_path}")
+            with open(file_path, "r", encoding=fallback_encoding) as file:
+                content = file.read()
+
+        estimated_size = len(content.encode("utf-8"))
+
+        if current_size + estimated_size > max_size_mb * 1024 * 1024:
+            target_file_path = os.path.join(target_dir, f"combined_{file_counter}.txt")
+            with open(target_file_path, "w", encoding="utf-8") as target_file:
+                target_file.write(separator.join(current_content))
+            file_counter += 1
+            current_content = [content]
+            current_size = estimated_size
+        else:
+            current_content.append(content)
+            current_size += estimated_size
+
+    if current_content:
+        target_file_path = os.path.join(target_dir, f"combined_{file_counter}.txt")
+        with open(target_file_path, "w", encoding="utf-8") as target_file:
+            target_file.write(separator.join(current_content))
+
+
+if __name__ == "__main__":
+
+    parser = argparse.ArgumentParser(description="GPT Model Training Configuration")
+
+    parser.add_argument("--data_dir", type=str, default="gutenberg/data",
+                        help="Directory containing the downloaded raw training data")
+    parser.add_argument("--max_size_mb", type=int, default=500,
+                        help="The maximum file size for each concatenated file in megabytes")
+    parser.add_argument("--output_dir", type=str, default="gutenberg_preprocessed",
+                        help="Directory where the preprocessed data will be saved")
+
+    args = parser.parse_args()
+
+    all_files = [os.path.join(path, name) for path, subdirs, files in os.walk(args.data_dir)
+                 for name in files if name.endswith((".txt", ".txt.utf8")) and "raw" not in path]
+
+    target_dir = "path_to_your_large_files"
+    print(f"{len(all_files)} files to process.")
+
+    combine_files(all_files, args.output_dir)
diff --git a/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py b/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py
new file mode 100644
index 0000000..c25bfe2
--- /dev/null
+++ b/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py
@@ -0,0 +1,218 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+"""
+Script for pretraining a small GPT-2 124M parameter model
+on books from Project Gutenberg.
+
+Before running this script, make sure you downloaded and
+processed the dataset as described in the README.md.
+"""
+
+import argparse
+import os
+from pathlib import Path
+import time
+import torch
+from previous_chapters import (
+    create_dataloader_v1,
+    GPTModel,
+    generate_and_print_sample,
+    calc_loss_batch,
+    evaluate_model,
+    plot_losses
+)
+
+
+def read_text_file(file_path):
+    with open(file_path, "r", encoding="utf-8") as file:
+        text_data = file.read()
+    return text_data
+
+
+def create_dataloaders(text_data, train_ratio, batch_size, max_length, stride):
+    split_idx = int(train_ratio * len(text_data))
+    train_loader = create_dataloader_v1(
+        text_data[:split_idx],
+        batch_size=batch_size,
+        max_length=max_length,
+        stride=stride,
+        drop_last=True,
+        shuffle=True
+    )
+    val_loader = create_dataloader_v1(
+        text_data[split_idx:],
+        batch_size=batch_size,
+        max_length=max_length,
+        stride=stride,
+        drop_last=False,
+        shuffle=False
+    )
+    return train_loader, val_loader
+
+
+def convert_time(seconds):
+    hours, rem = divmod(seconds, 3600)
+    minutes, seconds = divmod(rem, 60)
+    return int(hours), int(minutes), int(seconds)
+
+
+def print_eta(start_time, book_start_time, index, total_files):
+    book_end_time = time.time()  # End time of processing this book
+    elapsed_time = book_end_time - book_start_time
+    total_elapsed_time = book_end_time - start_time
+    books_remaining = total_files - index
+    average_time_per_book = total_elapsed_time / index
+    eta = average_time_per_book * books_remaining
+
+    book_h, book_m, book_s = convert_time(elapsed_time)
+    total_h, total_m, total_s = convert_time(total_elapsed_time)
+    eta_h, eta_m, eta_s = convert_time(eta)
+
+    print(f"Book processed {book_h}h {book_m}m {book_s}s"
+          f"\nTotal time elapsed {total_h}h {total_m}m {total_s}s"
+          f"\nETA for remaining books: {eta_h}h {eta_m}m {eta_s}s")
+
+
+def train_model_simple(model, optimizer, device, n_epochs,
+                       eval_freq, eval_iter, print_sample_iter, start_context,
+                       output_dir, save_ckpt_freq,
+                       batch_size=1024, train_ratio=0.90):
+
+    train_losses, val_losses, track_tokens_seen = [], [], []
+    tokens_seen = 0
+    global_step = -1
+    start_time = time.time()
+
+    try:
+        for epoch in range(n_epochs):
+
+            # Iterate over the books in the training corpus
+            for index, file_path in enumerate(all_files, 1):
+                book_start_time = time.time()
+                text_data = read_text_file(file_path) + " <|endoftext|> "
+                print(f"Tokenizing file {index} of {total_files}: {file_path}")
+
+                # Initialize new data loaders for each book
+                train_loader, val_loader = create_dataloaders(
+                    text_data,
+                    train_ratio=train_ratio,
+                    batch_size=batch_size,
+                    max_length=GPT_CONFIG_124M["ctx_len"],
+                    stride=GPT_CONFIG_124M["ctx_len"]
+                )
+                print("Training ...")
+                model.train()
+                for input_batch, target_batch in train_loader:
+                    optimizer.zero_grad()
+                    loss = calc_loss_batch(input_batch, target_batch, model, device)
+                    loss.backward()
+                    optimizer.step()
+                    tokens_seen += input_batch.numel()
+                    global_step += 1
+
+                    # Optional evaluation step
+                    if global_step % eval_freq == 0:
+                        train_loss, val_loss = evaluate_model(
+                            model, train_loader, val_loader, device, eval_iter)
+                        train_losses.append(train_loss)
+                        val_losses.append(val_loss)
+                        track_tokens_seen.append(tokens_seen)
+                        print(f"Ep {epoch+1} (Step {global_step}): "
+                              f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}")
+
+                    # Generate text passage
+                    if global_step % print_sample_iter == 0:
+                        generate_and_print_sample(
+                            model, train_loader.dataset.tokenizer, device, start_context
+                        )
+
+                if global_step % save_ckpt_freq:
+                    file_name = output_dir / f"model_pg_{global_step}.pth"
+                    torch.save(model.state_dict(), file_name)
+                    print(f"Saved {file_name}")
+
+                print_eta(start_time, book_start_time, index, total_files)
+
+    except KeyboardInterrupt:
+        file_name = output_dir / f"model_pg_{global_step}_interrupted.pth"
+        torch.save(model.state_dict(), file_name)
+        print(f"Saved {file_name}")
+
+    return train_losses, val_losses, track_tokens_seen
+
+
+if __name__ == "__main__":
+
+    parser = argparse.ArgumentParser(description='GPT Model Training Configuration')
+
+    parser.add_argument('--data_dir', type=str, default='gutenberg/data',
+                        help='Directory containing the training data')
+    parser.add_argument('--output_dir', type=str, default='model_checkpoints',
+                        help='Directory where the model checkpoints will be saved')
+    parser.add_argument('--n_epochs', type=int, default=1,
+                        help='Number of epochs to train the model')
+    parser.add_argument('--print_sample_iter', type=int, default=1000,
+                        help='Iterations between printing sample outputs')
+    parser.add_argument('--eval_freq', type=int, default=100,
+                        help='Frequency of evaluations during training')
+    parser.add_argument('--save_ckpt_freq', type=int, default=100_000,
+                        help='Frequency of saving model checkpoints during training')
+    parser.add_argument('--lr', type=float, default=5e-4,
+                        help='Learning rate for the optimizer')
+    parser.add_argument('--batch_size', type=int, default=4,
+                        help='Batch size for training')
+
+    args = parser.parse_args()
+
+    GPT_CONFIG_124M = {
+        "vocab_size": 50257,  # Vocabulary size
+        "ctx_len": 1024,      # Context length
+        "emb_dim": 768,       # Embedding dimension
+        "n_heads": 12,        # Number of attention heads
+        "n_layers": 12,       # Number of layers
+        "drop_rate": 0.1,     # Dropout rate
+        "qkv_bias": False     # Query-key-value bias
+    }
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    torch.manual_seed(123)
+    model = GPTModel(GPT_CONFIG_124M)
+    model.to(device)
+    optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=0.1)
+
+    data_dir = args.data_dir
+    all_files = [os.path.join(path, name) for path, subdirs, files
+                 in os.walk(data_dir) for name in files if name.endswith((".txt"))]
+    total_files = len(all_files)
+
+    if total_files == 0:
+        print("No training text files found. Make sure you "
+              "selected the correct input directory")
+        quit()
+    print("Total files:", total_files)
+
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    train_losses, val_losses, tokens_seen = train_model_simple(
+        model, optimizer, device,
+        batch_size=args.batch_size,
+        n_epochs=args.n_epochs,
+        eval_freq=args.eval_freq,
+        eval_iter=1,
+        print_sample_iter=args.print_sample_iter,
+        output_dir=output_dir,
+        save_ckpt_freq=args.save_ckpt_freq,
+        start_context="Every effort moves you",
+    )
+
+    epochs_tensor = torch.linspace(0, args.n_epochs, len(train_losses))
+
+    print("debug", epochs_tensor, tokens_seen, train_losses, val_losses, output_dir)
+    plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses, output_dir)
+
+    torch.save(model.state_dict(), output_dir / "model_pg_final.pth")
+    print(f"Maximum GPU memory allocated: {torch.cuda.max_memory_allocated() / 1e9:.2f} GB")
diff --git a/ch05/03_bonus_pretraining_on_gutenberg/previous_chapters.py b/ch05/03_bonus_pretraining_on_gutenberg/previous_chapters.py
new file mode 100644
index 0000000..46fd9d2
--- /dev/null
+++ b/ch05/03_bonus_pretraining_on_gutenberg/previous_chapters.py
@@ -0,0 +1,316 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+# This file collects all the relevant code that we covered thus far
+# throughout Chapters 2-4.
+# This file can be run as a standalone script.
+
+import tiktoken
+import torch
+import torch.nn as nn
+from torch.utils.data import Dataset, DataLoader
+import matplotlib.pyplot as plt
+
+
+#####################################
+# Chapter 2
+#####################################
+
+
+class GPTDatasetV1(Dataset):
+    def __init__(self, txt, tokenizer, max_length, stride):
+        self.tokenizer = tokenizer
+        self.input_ids = []
+        self.target_ids = []
+
+        token_ids = tokenizer.encode(txt, allowed_special={'<|endoftext|>'})
+
+        for i in range(0, len(token_ids) - max_length, stride):
+            input_chunk = token_ids[i:i + max_length]
+            target_chunk = token_ids[i + 1: i + max_length + 1]
+            self.input_ids.append(torch.tensor(input_chunk))
+            self.target_ids.append(torch.tensor(target_chunk))
+
+    def __len__(self):
+        return len(self.input_ids)
+
+    def __getitem__(self, idx):
+        return self.input_ids[idx], self.target_ids[idx]
+
+
+def create_dataloader_v1(txt, batch_size=4, max_length=256,
+                         stride=128, shuffle=True, drop_last=True):
+    tokenizer = tiktoken.get_encoding("gpt2")
+    dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)
+    dataloader = DataLoader(
+        dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
+
+    return dataloader
+
+
+#####################################
+# Chapter 3
+#####################################
+
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_in, d_out, block_size, dropout, num_heads, qkv_bias=False):
+        super().__init__()
+        assert d_out % num_heads == 0, "d_out must be divisible by n_heads"
+
+        self.d_out = d_out
+        self.num_heads = num_heads
+        self.head_dim = d_out // num_heads  # Reduce the projection dim to match desired output dim
+
+        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.out_proj = nn.Linear(d_out, d_out)  # Linear layer to combine head outputs
+        self.dropout = nn.Dropout(dropout)
+        self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1))
+
+    def forward(self, x):
+        b, num_tokens, d_in = x.shape
+
+        keys = self.W_key(x)  # Shape: (b, num_tokens, d_out)
+        queries = self.W_query(x)
+        values = self.W_value(x)
+
+        # We implicitly split the matrix by adding a `num_heads` dimension
+        # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim)
+        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
+        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
+        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
+
+        # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim)
+        keys = keys.transpose(1, 2)
+        queries = queries.transpose(1, 2)
+        values = values.transpose(1, 2)
+
+        # Compute scaled dot-product attention (aka self-attention) with a causal mask
+        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head
+
+        # Original mask truncated to the number of tokens and converted to boolean
+        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
+
+        # Use the mask to fill attention scores
+        attn_scores.masked_fill_(mask_bool, -torch.inf)
+
+        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
+        attn_weights = self.dropout(attn_weights)
+
+        # Shape: (b, num_tokens, num_heads, head_dim)
+        context_vec = (attn_weights @ values).transpose(1, 2)
+
+        # Combine heads, where self.d_out = self.num_heads * self.head_dim
+        context_vec = context_vec.reshape(b, num_tokens, self.d_out)
+        context_vec = self.out_proj(context_vec)  # optional projection
+
+        return context_vec
+
+
+#####################################
+# Chapter 4
+#####################################
+
+class LayerNorm(nn.Module):
+    def __init__(self, emb_dim):
+        super().__init__()
+        self.eps = 1e-5
+        self.scale = nn.Parameter(torch.ones(emb_dim))
+        self.shift = nn.Parameter(torch.zeros(emb_dim))
+
+    def forward(self, x):
+        mean = x.mean(dim=-1, keepdim=True)
+        var = x.var(dim=-1, keepdim=True, unbiased=False)
+        norm_x = (x - mean) / torch.sqrt(var + self.eps)
+        return self.scale * norm_x + self.shift
+
+
+class GELU(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return 0.5 * x * (1 + torch.tanh(
+            torch.sqrt(torch.tensor(2.0 / torch.pi)) *
+            (x + 0.044715 * torch.pow(x, 3))
+        ))
+
+
+class FeedForward(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]),
+            GELU(),
+            nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]),
+            nn.Dropout(cfg["drop_rate"])
+        )
+
+    def forward(self, x):
+        return self.layers(x)
+
+
+class TransformerBlock(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.att = MultiHeadAttention(
+            d_in=cfg["emb_dim"],
+            d_out=cfg["emb_dim"],
+            block_size=cfg["ctx_len"],
+            num_heads=cfg["n_heads"],
+            dropout=cfg["drop_rate"],
+            qkv_bias=cfg["qkv_bias"])
+        self.ff = FeedForward(cfg)
+        self.norm1 = LayerNorm(cfg["emb_dim"])
+        self.norm2 = LayerNorm(cfg["emb_dim"])
+        self.drop_resid = nn.Dropout(cfg["drop_rate"])
+
+    def forward(self, x):
+        # Shortcut connection for attention block
+        shortcut = x
+        x = self.norm1(x)
+        x = self.att(x)   # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        # Shortcut connection for feed-forward block
+        shortcut = x
+        x = self.norm2(x)
+        x = self.ff(x)
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        return x
+
+
+class GPTModel(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"])
+        self.pos_emb = nn.Embedding(cfg["ctx_len"], cfg["emb_dim"])
+        self.drop_emb = nn.Dropout(cfg["drop_rate"])
+
+        self.trf_blocks = nn.Sequential(
+            *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])
+
+        self.final_norm = LayerNorm(cfg["emb_dim"])
+        self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False)
+
+    def forward(self, in_idx):
+        batch_size, seq_len = in_idx.shape
+        tok_embeds = self.tok_emb(in_idx)
+        pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device))
+        x = tok_embeds + pos_embeds  # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_emb(x)
+        x = self.trf_blocks(x)
+        x = self.final_norm(x)
+        logits = self.out_head(x)
+        return logits
+
+
+def generate_text_simple(model, idx, max_new_tokens, context_size):
+    # idx is (B, T) array of indices in the current context
+    for _ in range(max_new_tokens):
+
+        # Crop current context if it exceeds the supported context size
+        # E.g., if LLM supports only 5 tokens, and the context size is 10
+        # then only the last 5 tokens are used as context
+        idx_cond = idx[:, -context_size:]
+
+        # Get the predictions
+        with torch.no_grad():
+            logits = model(idx_cond)
+
+        # Focus only on the last time step
+        # (batch, n_token, vocab_size) becomes (batch, vocab_size)
+        logits = logits[:, -1, :]
+
+        # Get the idx of the vocab entry with the highest logits value
+        idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch, 1)
+
+        # Append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch, n_tokens+1)
+
+    return idx
+
+
+#####################################
+# Chapter 5
+####################################
+
+
+def calc_loss_batch(input_batch, target_batch, model, device):
+    input_batch, target_batch = input_batch.to(device), target_batch.to(device)
+    logits = model(input_batch)
+    loss = torch.nn.functional.cross_entropy(logits.flatten(0, -1), target_batch.flatten())
+    return loss
+
+
+def calc_loss_loader(data_loader, model, device, num_batches=None):
+    total_loss = 0.
+    if num_batches is None:
+        num_batches = len(data_loader)
+    else:
+        num_batches = min(num_batches, len(data_loader))
+    for i, (input_batch, target_batch) in enumerate(data_loader):
+        if i < num_batches:
+            loss = calc_loss_batch(input_batch, target_batch, model, device)
+            total_loss += loss.item()
+        else:
+            break
+    return total_loss / num_batches
+
+
+def evaluate_model(model, train_loader, val_loader, device, eval_iter):
+    model.eval()
+    with torch.no_grad():
+        train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter)
+        val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter)
+    model.train()
+    return train_loss, val_loss
+
+
+def generate_and_print_sample(model, tokenizer, device, start_context):
+    model.eval()
+    context_size = model.pos_emb.weight.shape[0]
+    encoded = text_to_token_ids(start_context, tokenizer).to(device)
+    with torch.no_grad():
+        token_ids = generate_text_simple(
+            model=model, idx=encoded,
+            max_new_tokens=50, context_size=context_size)
+        decoded_text = token_ids_to_text(token_ids, tokenizer)
+        print(decoded_text.replace("\n", " "))  # Compact print format
+    model.train()
+
+
+def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses, output_dir):
+    fig, ax1 = plt.subplots()
+
+    # Plot training and validation loss against epochs
+    ax1.plot(epochs_seen, train_losses, label="Training loss")
+    ax1.plot(epochs_seen, val_losses, linestyle="-.", label="Validation loss")
+    ax1.set_xlabel("Epochs")
+    ax1.set_ylabel("Loss")
+    ax1.legend(loc="upper right")
+
+    # Create a second x-axis for tokens seen
+    ax2 = ax1.twiny()  # Create a second x-axis that shares the same y-axis
+    ax2.plot(tokens_seen, train_losses, alpha=0)  # Invisible plot for aligning ticks
+    ax2.set_xlabel("Tokens seen")
+
+    fig.tight_layout()  # Adjust layout to make room
+    plt.savefig(output_dir / "losses.pdf")
+
+
+def text_to_token_ids(text, tokenizer):
+    encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'})
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)  # Add batch dimension
+    return encoded_tensor
+
+
+def token_ids_to_text(token_ids, tokenizer):
+    flat = token_ids.squeeze(0)  # Remove batch dimension
+    return tokenizer.decode(flat.tolist())
diff --git a/ch05/04_learning_rate_schedulers/README.md b/ch05/04_learning_rate_schedulers/README.md
new file mode 100644
index 0000000..af310da
--- /dev/null
+++ b/ch05/04_learning_rate_schedulers/README.md
@@ -0,0 +1,5 @@
+# Adding Bells and Whistles to the Training Loop
+
+The main chapter used a relatively simple training function to keep the code readable and fit Chapter 5 within the page limits. Optionally, we can add a linear warm-up, a cosine decay schedule, and gradient clipping to improve the training stability and convergence.
+
+You can find the code for this more sophisticated training function in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb).
\ No newline at end of file
diff --git a/ch05/05_bonus_hparam_tuning/README.md b/ch05/05_bonus_hparam_tuning/README.md
new file mode 100644
index 0000000..b8ceb83
--- /dev/null
+++ b/ch05/05_bonus_hparam_tuning/README.md
@@ -0,0 +1,10 @@
+# Optimizing Hyperparameters for Pretraining
+
+The [hparam_search.py](hparam_search.py) is script based on the extended training function in [
+Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb) to find optimal hyperparameters via grid search 
+
+The [hparam_search.py](hparam_search.py) script, based on the extended training function in [
+Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
+
+>[!NOTE]
+This script will take a long time to run. You may want to reduce the number of hyperparameter configurations explored in the `HPARAM_GRID` dictionary at the top.
\ No newline at end of file
diff --git a/ch05/05_bonus_hparam_tuning/hparam_search.py b/ch05/05_bonus_hparam_tuning/hparam_search.py
new file mode 100644
index 0000000..abe31e0
--- /dev/null
+++ b/ch05/05_bonus_hparam_tuning/hparam_search.py
@@ -0,0 +1,208 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+import itertools
+import math
+import os
+import torch
+from previous_chapters import GPTModel, create_dataloader_v1
+
+
+# Define a grid of hyperparameters to search over
+HPARAM_GRID = {
+    "batch_size": [2, 4, 8, 16],
+    "drop_rate": [0.0, 0.1, 0.2],
+    "warmup_iters": [10, 20, 30],
+    "weight_decay": [0.1, 0.01, 0.0],
+    "peak_lr": [0.0001, 0.0005, 0.001, 0.005],
+    "initial_lr": [0.00005, 0.0001],
+    "min_lr": [0.00005, 0.00001, 0.0001],
+    "n_epochs": [5, 10, 15, 20, 25],
+}
+
+
+def calc_loss_loader(data_loader, model, device, num_batches=None):
+    total_loss = 0.
+    if num_batches is None:
+        num_batches = len(data_loader)
+    else:
+        num_batches = min(num_batches, len(data_loader))
+    for i, (input_batch, target_batch) in enumerate(data_loader):
+        if i < num_batches:
+            loss = calc_loss_batch(input_batch, target_batch, model, device)
+            total_loss += loss.item()
+        else:
+            break
+    return total_loss / num_batches
+
+
+def calc_loss_batch(input_batch, target_batch, model, device):
+    input_batch, target_batch = input_batch.to(device), target_batch.to(device)
+
+    logits = model(input_batch)
+    logits = logits.view(-1, logits.size(-1))
+    loss = torch.nn.functional.cross_entropy(logits, target_batch.view(-1))
+    return loss
+
+
+def evaluate_model(model, train_loader, val_loader, device, eval_iter):
+    model.eval()
+    with torch.no_grad():
+        train_loss = calc_loss_loader(train_loader, model, device, num_iters=eval_iter)
+        val_loss = calc_loss_loader(val_loader, model, device, num_iters=eval_iter)
+    model.train()
+    return train_loss, val_loss
+
+
+def train_model(model, train_loader, val_loader, optimizer, device,
+                n_epochs, eval_freq, eval_iter,
+                encoded_start_context, warmup_iters=10,
+                initial_lr=3e-05, min_lr=1e-6):
+    global_step = 0
+
+    max_lr = optimizer.param_groups[0]["lr"]
+
+    # Calculate total number of iterations
+    total_training_iters = len(train_loader) * n_epochs
+
+    # Calculate the learning rate increment at each step during warmup
+    lr_increment = (optimizer.param_groups[0]["lr"] - initial_lr) / warmup_iters
+
+    for epoch in range(n_epochs):
+        model.train()
+        for input_batch, target_batch in train_loader:
+            optimizer.zero_grad()
+
+            # Increment the global step at the beginning of the iteration
+            global_step += 1
+
+            # Warmup: adjust learning rate linearly
+            if global_step < warmup_iters:
+                lr = initial_lr + global_step * lr_increment
+            # Cosine annealing phase
+            else:
+                progress = (global_step - warmup_iters) / (total_training_iters - warmup_iters)
+                lr = min_lr + (max_lr - min_lr) * 0.5 * (1 + math.cos(math.pi * progress))
+
+            # Apply the calculated learning rate
+            for param_group in optimizer.param_groups:
+                param_group["lr"] = lr
+
+            loss = calc_loss_batch(input_batch, target_batch, model, device)
+            loss.backward()
+
+            # Apply gradient clipping
+            if global_step >= warmup_iters:
+                torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
+
+            optimizer.step()
+
+    train_loss, val_loss = evaluate_model(model, train_loader, val_loader, device, eval_iter)
+
+    return train_loss, val_loss
+
+
+if __name__ == "__main__":
+
+    # Generate all combinations of hyperparameters
+    hyperparameter_combinations = list(itertools.product(*HPARAM_GRID.values()))
+    total_combinations = len(hyperparameter_combinations)
+    print(f"Total hyperparameter configurations: {total_combinations}")
+
+    # Placeholder for the best loss and best hyperparameters
+    best_val_loss = float('inf')
+    best_hparams = {}
+
+    script_path = os.path.abspath(__file__)
+    script_dir = os.path.dirname(script_path)
+    with open(os.path.join(script_dir, "the-verdict.txt"), "r", encoding="utf-8") as file:
+        text_data = file.read()
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+    train_ratio = 0.95
+    split_idx = int(train_ratio * len(text_data))
+
+    torch.manual_seed(123)
+
+    interrupted = False
+    current_config = 0
+    for combination in hyperparameter_combinations:
+
+        try:
+            current_config += 1
+            print(f"Evaluating configuration {current_config} of {total_combinations}")
+
+            # Unpack the current combination of hyperparameters
+            HPARAM_CONFIG = dict(zip(HPARAM_GRID.keys(), combination))
+
+            GPT_CONFIG_124M = {
+                "vocab_size": 50257,  # Vocabulary size
+                "ctx_len": 256,       # Context length -- shortened from original 1024 tokens
+                "emb_dim": 768,       # Embedding dimension
+                "n_heads": 12,        # Number of attention heads
+                "n_layers": 12,       # Number of layers
+                "drop_rate": HPARAM_CONFIG["drop_rate"],
+                "qkv_bias": False,    # Query-Key-Value bias
+            }
+
+            torch.manual_seed(123)
+            train_loader = create_dataloader_v1(
+                text_data[:split_idx],
+                batch_size=HPARAM_CONFIG["batch_size"],
+                max_length=GPT_CONFIG_124M["ctx_len"],
+                stride=GPT_CONFIG_124M["ctx_len"],
+                drop_last=True,
+                shuffle=True
+            )
+
+            val_loader = create_dataloader_v1(
+                text_data[split_idx:],
+                batch_size=HPARAM_CONFIG["batch_size"],
+                max_length=GPT_CONFIG_124M["ctx_len"],
+                stride=GPT_CONFIG_124M["ctx_len"],
+                drop_last=False,
+                shuffle=False
+            )
+
+            model = GPTModel(GPT_CONFIG_124M)
+            model.to(device)
+
+            optimizer = torch.optim.AdamW(
+                model.parameters(),
+                lr=HPARAM_CONFIG["peak_lr"],
+                weight_decay=HPARAM_CONFIG["weight_decay"]
+            )
+
+            encoded_start_context = train_loader.dataset.tokenizer.encode("Nevertheless")
+            encoded_tensor = torch.tensor(encoded_start_context).unsqueeze(0)
+
+            train_loss, val_loss = train_model(
+                model, train_loader, val_loader, optimizer, device,
+                n_epochs=HPARAM_CONFIG["n_epochs"],
+                eval_freq=5, eval_iter=1,
+                encoded_start_context=encoded_tensor,
+                warmup_iters=HPARAM_CONFIG["warmup_iters"],
+                initial_lr=HPARAM_CONFIG["initial_lr"],
+                min_lr=HPARAM_CONFIG["min_lr"]
+            )
+
+            # Log the best hyperparameters based on validation loss
+            if val_loss < best_val_loss:
+                best_val_loss = val_loss
+                best_train_loss = train_loss
+                best_hparams = HPARAM_CONFIG
+
+        except KeyboardInterrupt:
+            print("Hyperparameter search completed.")
+            print(f"Best hyperparameters: {best_hparams}")
+            print(f"Best Val loss: {best_val_loss} | Training loss {train_loss}")
+            interrupted = True
+            break
+
+    if not interrupted:
+        print("Hyperparameter search completed.")
+        print(f"Best hyperparameters: {best_hparams}")
+        print(f"Best Val loss: {best_val_loss} | Training loss {train_loss}")
diff --git a/ch05/05_bonus_hparam_tuning/previous_chapters.py b/ch05/05_bonus_hparam_tuning/previous_chapters.py
new file mode 100644
index 0000000..1091fa3
--- /dev/null
+++ b/ch05/05_bonus_hparam_tuning/previous_chapters.py
@@ -0,0 +1,281 @@
+# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
+# Source for "Build a Large Language Model From Scratch"
+#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
+# Code: https://github.com/rasbt/LLMs-from-scratch
+
+# This file collects all the relevant code that we covered thus far
+# throughout Chapters 2-4.
+# This file can be run as a standalone script.
+
+import tiktoken
+import torch
+import torch.nn as nn
+from torch.utils.data import Dataset, DataLoader
+
+#####################################
+# Chapter 2
+#####################################
+
+
+class GPTDatasetV1(Dataset):
+    def __init__(self, txt, tokenizer, max_length, stride):
+        self.tokenizer = tokenizer
+        self.input_ids = []
+        self.target_ids = []
+
+        # Tokenize the entire text
+        token_ids = tokenizer.encode(txt)
+
+        # Use a sliding window to chunk the book into overlapping sequences of max_length
+        for i in range(0, len(token_ids) - max_length, stride):
+            input_chunk = token_ids[i:i + max_length]
+            target_chunk = token_ids[i + 1: i + max_length + 1]
+            self.input_ids.append(torch.tensor(input_chunk))
+            self.target_ids.append(torch.tensor(target_chunk))
+
+    def __len__(self):
+        return len(self.input_ids)
+
+    def __getitem__(self, idx):
+        return self.input_ids[idx], self.target_ids[idx]
+
+
+def create_dataloader_v1(txt, batch_size=4, max_length=256,
+                         stride=128, shuffle=True, drop_last=True):
+    # Initialize the tokenizer
+    tokenizer = tiktoken.get_encoding("gpt2")
+
+    # Create dataset
+    dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)
+
+    # Create dataloader
+    dataloader = DataLoader(
+        dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
+
+    return dataloader
+
+
+#####################################
+# Chapter 3
+#####################################
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_in, d_out, block_size, dropout, num_heads, qkv_bias=False):
+        super().__init__()
+        assert d_out % num_heads == 0, "d_out must be divisible by num_heads"
+
+        self.d_out = d_out
+        self.num_heads = num_heads
+        self.head_dim = d_out // num_heads  # Reduce the projection dim to match desired output dim
+
+        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
+        self.out_proj = nn.Linear(d_out, d_out)  # Linear layer to combine head outputs
+        self.dropout = nn.Dropout(dropout)
+        self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1))
+
+    def forward(self, x):
+        b, num_tokens, d_in = x.shape
+
+        keys = self.W_key(x)  # Shape: (b, num_tokens, d_out)
+        queries = self.W_query(x)
+        values = self.W_value(x)
+
+        # We implicitly split the matrix by adding a `num_heads` dimension
+        # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim)
+        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
+        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
+        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
+
+        # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim)
+        keys = keys.transpose(1, 2)
+        queries = queries.transpose(1, 2)
+        values = values.transpose(1, 2)
+
+        # Compute scaled dot-product attention (aka self-attention) with a causal mask
+        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head
+
+        # Original mask truncated to the number of tokens and converted to boolean
+        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]
+
+        # Use the mask to fill attention scores
+        attn_scores.masked_fill_(mask_bool, -torch.inf)
+
+        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
+        attn_weights = self.dropout(attn_weights)
+
+        # Shape: (b, num_tokens, num_heads, head_dim)
+        context_vec = (attn_weights @ values).transpose(1, 2)
+
+        # Combine heads, where self.d_out = self.num_heads * self.head_dim
+        context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
+        context_vec = self.out_proj(context_vec)  # optional projection
+
+        return context_vec
+
+
+#####################################
+# Chapter 4
+#####################################
+class LayerNorm(nn.Module):
+    def __init__(self, emb_dim):
+        super().__init__()
+        self.eps = 1e-5
+        self.scale = nn.Parameter(torch.ones(emb_dim))
+        self.shift = nn.Parameter(torch.zeros(emb_dim))
+
+    def forward(self, x):
+        mean = x.mean(dim=-1, keepdim=True)
+        var = x.var(dim=-1, keepdim=True, unbiased=False)
+        norm_x = (x - mean) / torch.sqrt(var + self.eps)
+        return self.scale * norm_x + self.shift
+
+
+class GELU(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return 0.5 * x * (1 + torch.tanh(
+            torch.sqrt(torch.tensor(2.0 / torch.pi)) *
+            (x + 0.044715 * torch.pow(x, 3))
+        ))
+
+
+class FeedForward(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]),
+            GELU(),
+            nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]),
+            nn.Dropout(cfg["drop_rate"])
+        )
+
+    def forward(self, x):
+        return self.layers(x)
+
+
+class TransformerBlock(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.att = MultiHeadAttention(
+            d_in=cfg["emb_dim"],
+            d_out=cfg["emb_dim"],
+            block_size=cfg["ctx_len"],
+            num_heads=cfg["n_heads"],
+            dropout=cfg["drop_rate"],
+            qkv_bias=cfg["qkv_bias"])
+        self.ff = FeedForward(cfg)
+        self.norm1 = LayerNorm(cfg["emb_dim"])
+        self.norm2 = LayerNorm(cfg["emb_dim"])
+        self.drop_resid = nn.Dropout(cfg["drop_rate"])
+
+    def forward(self, x):
+        # Shortcut connection for attention block
+        shortcut = x
+        x = self.norm1(x)
+        x = self.att(x)   # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        # Shortcut connection for feed-forward block
+        shortcut = x
+        x = self.norm2(x)
+        x = self.ff(x)
+        x = self.drop_resid(x)
+        x = x + shortcut  # Add the original input back
+
+        return x
+
+
+class GPTModel(nn.Module):
+    def __init__(self, cfg):
+        super().__init__()
+        self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"])
+        self.pos_emb = nn.Embedding(cfg["ctx_len"], cfg["emb_dim"])
+        self.drop_emb = nn.Dropout(cfg["drop_rate"])
+
+        self.trf_blocks = nn.Sequential(
+            *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])
+
+        self.final_norm = LayerNorm(cfg["emb_dim"])
+        self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False)
+
+    def forward(self, in_idx):
+        batch_size, seq_len = in_idx.shape
+        tok_embeds = self.tok_emb(in_idx)
+        pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device))
+        x = tok_embeds + pos_embeds  # Shape [batch_size, num_tokens, emb_size]
+        x = self.drop_emb(x)
+        x = self.trf_blocks(x)
+        x = self.final_norm(x)
+        logits = self.out_head(x)
+        return logits
+
+
+def generate_text_simple(model, idx, max_new_tokens, context_size):
+    # idx is (B, T) array of indices in the current context
+    for _ in range(max_new_tokens):
+
+        # Crop current context if it exceeds the supported context size
+        # E.g., if LLM supports only 5 tokens, and the context size is 10
+        # then only the last 5 tokens are used as context
+        idx_cond = idx[:, -context_size:]
+
+        # Get the predictions
+        with torch.no_grad():
+            logits = model(idx_cond)
+
+        # Focus only on the last time step
+        # (batch, n_token, vocab_size) becomes (batch, vocab_size)
+        logits = logits[:, -1, :]
+
+        # Get the idx of the vocab entry with the highest logits value
+        idx_next = torch.argmax(logits, dim=-1, keepdim=True)  # (batch, 1)
+
+        # Append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1)  # (batch, n_tokens+1)
+
+    return idx
+
+
+if __name__ == "__main__":
+
+    GPT_CONFIG_124M = {
+        "vocab_size": 50257,  # Vocabulary size
+        "ctx_len": 1024,      # Context length
+        "emb_dim": 768,       # Embedding dimension
+        "n_heads": 12,        # Number of attention heads
+        "n_layers": 12,       # Number of layers
+        "drop_rate": 0.1,     # Dropout rate
+        "qkv_bias": False     # Query-Key-Value bias
+    }
+
+    torch.manual_seed(123)
+    model = GPTModel(GPT_CONFIG_124M)
+    model.eval()  # disable dropout
+
+    start_context = "Hello, I am"
+
+    tokenizer = tiktoken.get_encoding("gpt2")
+    encoded = tokenizer.encode(start_context)
+    encoded_tensor = torch.tensor(encoded).unsqueeze(0)
+
+    print(f"\n{50*'='}\n{22*' '}IN\n{50*'='}")
+    print("\nInput text:", start_context)
+    print("Encoded input text:", encoded)
+    print("encoded_tensor.shape:", encoded_tensor.shape)
+
+    out = generate_text_simple(
+        model=model,
+        idx=encoded_tensor,
+        max_new_tokens=10,
+        context_size=GPT_CONFIG_124M["ctx_len"]
+    )
+    decoded_text = tokenizer.decode(out.squeeze(0).tolist())
+
+    print(f"\n\n{50*'='}\n{22*' '}OUT\n{50*'='}")
+    print("\nOutput:", out)
+    print("Output length:", len(out[0]))
+    print("Output text:", decoded_text)
diff --git a/ch05/05_bonus_hparam_tuning/the-verdict.txt b/ch05/05_bonus_hparam_tuning/the-verdict.txt
new file mode 100644
index 0000000..6b651c7
--- /dev/null
+++ b/ch05/05_bonus_hparam_tuning/the-verdict.txt
@@ -0,0 +1,165 @@
+I HAD always thought Jack Gisburn rather a cheap genius--though a good fellow enough--so it was no great surprise to me to hear that, in the height of his glory, he had dropped his painting, married a rich widow, and established himself in a villa on the Riviera. (Though I rather thought it would have been Rome or Florence.)
+
+"The height of his glory"--that was what the women called it. I can hear Mrs. Gideon Thwing--his last Chicago sitter--deploring his unaccountable abdication. "Of course it's going to send the value of my picture 'way up; but I don't think of that, Mr. Rickham--the loss to Arrt is all I think of." The word, on Mrs. Thwing's lips, multiplied its _rs_ as though they were reflected in an endless vista of mirrors. And it was not only the Mrs. Thwings who mourned. Had not the exquisite Hermia Croft, at the last Grafton Gallery show, stopped me before Gisburn's "Moon-dancers" to say, with tears in her eyes: "We shall not look upon its like again"?
+
+Well!--even through the prism of Hermia's tears I felt able to face the fact with equanimity. Poor Jack Gisburn! The women had made him--it was fitting that they should mourn him. Among his own sex fewer regrets were heard, and in his own trade hardly a murmur. Professional jealousy? Perhaps. If it were, the honour of the craft was vindicated by little Claude Nutley, who, in all good faith, brought out in the Burlington a very handsome "obituary" on Jack--one of those showy articles stocked with random technicalities that I have heard (I won't say by whom) compared to Gisburn's painting. And so--his resolve being apparently irrevocable--the discussion gradually died out, and, as Mrs. Thwing had predicted, the price of "Gisburns" went up.
+
+It was not till three years later that, in the course of a few weeks' idling on the Riviera, it suddenly occurred to me to wonder why Gisburn had given up his painting. On reflection, it really was a tempting problem. To accuse his wife would have been too easy--his fair sitters had been denied the solace of saying that Mrs. Gisburn had "dragged him down." For Mrs. Gisburn--as such--had not existed till nearly a year after Jack's resolve had been taken. It might be that he had married her--since he liked his ease--because he didn't want to go on painting; but it would have been hard to prove that he had given up his painting because he had married her.
+
+Of course, if she had not dragged him down, she had equally, as Miss Croft contended, failed to "lift him up"--she had not led him back to the easel. To put the brush into his hand again--what a vocation for a wife! But Mrs. Gisburn appeared to have disdained it--and I felt it might be interesting to find out why.
+
+The desultory life of the Riviera lends itself to such purely academic speculations; and having, on my way to Monte Carlo, caught a glimpse of Jack's balustraded terraces between the pines, I had myself borne thither the next day.
+
+I found the couple at tea beneath their palm-trees; and Mrs. Gisburn's welcome was so genial that, in the ensuing weeks, I claimed it frequently. It was not that my hostess was "interesting": on that point I could have given Miss Croft the fullest reassurance. It was just because she was _not_ interesting--if I may be pardoned the bull--that I found her so. For Jack, all his life, had been surrounded by interesting women: they had fostered his art, it had been reared in the hot-house of their adulation. And it was therefore instructive to note what effect the "deadening atmosphere of mediocrity" (I quote Miss Croft) was having on him.
+
+I have mentioned that Mrs. Gisburn was rich; and it was immediately perceptible that her husband was extracting from this circumstance a delicate but substantial satisfaction. It is, as a rule, the people who scorn money who get most out of it; and Jack's elegant disdain of his wife's big balance enabled him, with an appearance of perfect good-breeding, to transmute it into objects of art and luxury. To the latter, I must add, he remained relatively indifferent; but he was buying Renaissance bronzes and eighteenth-century pictures with a discrimination that bespoke the amplest resources.
+
+"Money's only excuse is to put beauty into circulation," was one of the axioms he laid down across the Sevres and silver of an exquisitely appointed luncheon-table, when, on a later day, I had again run over from Monte Carlo; and Mrs. Gisburn, beaming on him, added for my enlightenment: "Jack is so morbidly sensitive to every form of beauty."
+
+Poor Jack! It had always been his fate to have women say such things of him: the fact should be set down in extenuation. What struck me now was that, for the first time, he resented the tone. I had seen him, so often, basking under similar tributes--was it the conjugal note that robbed them of their savour? No--for, oddly enough, it became apparent that he was fond of Mrs. Gisburn--fond enough not to see her absurdity. It was his own absurdity he seemed to be wincing under--his own attitude as an object for garlands and incense.
+
+"My dear, since I've chucked painting people don't say that stuff about me--they say it about Victor Grindle," was his only protest, as he rose from the table and strolled out onto the sunlit terrace.
+
+I glanced after him, struck by his last word. Victor Grindle was, in fact, becoming the man of the moment--as Jack himself, one might put it, had been the man of the hour. The younger artist was said to have formed himself at my friend's feet, and I wondered if a tinge of jealousy underlay the latter's mysterious abdication. But no--for it was not till after that event that the _rose Dubarry_ drawing-rooms had begun to display their "Grindles."
+
+I turned to Mrs. Gisburn, who had lingered to give a lump of sugar to her spaniel in the dining-room.
+
+"Why _has_ he chucked painting?" I asked abruptly.
+
+She raised her eyebrows with a hint of good-humoured surprise.
+
+"Oh, he doesn't _have_ to now, you know; and I want him to enjoy himself," she said quite simply.
+
+I looked about the spacious white-panelled room, with its _famille-verte_ vases repeating the tones of the pale damask curtains, and its eighteenth-century pastels in delicate faded frames.
+
+"Has he chucked his pictures too? I haven't seen a single one in the house."
+
+A slight shade of constraint crossed Mrs. Gisburn's open countenance. "It's his ridiculous modesty, you know. He says they're not fit to have about; he's sent them all away except one--my portrait--and that I have to keep upstairs."
+
+His ridiculous modesty--Jack's modesty about his pictures? My curiosity was growing like the bean-stalk. I said persuasively to my hostess: "I must really see your portrait, you know."
+
+She glanced out almost timorously at the terrace where her husband, lounging in a hooded chair, had lit a cigar and drawn the Russian deerhound's head between his knees.
+
+"Well, come while he's not looking," she said, with a laugh that tried to hide her nervousness; and I followed her between the marble Emperors of the hall, and up the wide stairs with terra-cotta nymphs poised among flowers at each landing.
+
+In the dimmest corner of her boudoir, amid a profusion of delicate and distinguished objects, hung one of the familiar oval canvases, in the inevitable garlanded frame. The mere outline of the frame called up all Gisburn's past!
+
+Mrs. Gisburn drew back the window-curtains, moved aside a _jardiniere_ full of pink azaleas, pushed an arm-chair away, and said: "If you stand here you can just manage to see it. I had it over the mantel-piece, but he wouldn't let it stay."
+
+Yes--I could just manage to see it--the first portrait of Jack's I had ever had to strain my eyes over! Usually they had the place of honour--say the central panel in a pale yellow or _rose Dubarry_ drawing-room, or a monumental easel placed so that it took the light through curtains of old Venetian point. The more modest place became the picture better; yet, as my eyes grew accustomed to the half-light, all the characteristic qualities came out--all the hesitations disguised as audacities, the tricks of prestidigitation by which, with such consummate skill, he managed to divert attention from the real business of the picture to some pretty irrelevance of detail. Mrs. Gisburn, presenting a neutral surface to work on--forming, as it were, so inevitably the background of her own picture--had lent herself in an unusual degree to the display of this false virtuosity. The picture was one of Jack's "strongest," as his admirers would have put it--it represented, on his part, a swelling of muscles, a congesting of veins, a balancing, straddling and straining, that reminded one of the circus-clown's ironic efforts to lift a feather. It met, in short, at every point the demand of lovely woman to be painted "strongly" because she was tired of being painted "sweetly"--and yet not to lose an atom of the sweetness.
+
+"It's the last he painted, you know," Mrs. Gisburn said with pardonable pride. "The last but one," she corrected herself--"but the other doesn't count, because he destroyed it."
+
+"Destroyed it?" I was about to follow up this clue when I heard a footstep and saw Jack himself on the threshold.
+
+As he stood there, his hands in the pockets of his velveteen coat, the thin brown waves of hair pushed back from his white forehead, his lean sunburnt cheeks furrowed by a smile that lifted the tips of a self-confident moustache, I felt to what a degree he had the same quality as his pictures--the quality of looking cleverer than he was.
+
+His wife glanced at him deprecatingly, but his eyes travelled past her to the portrait.
+
+"Mr. Rickham wanted to see it," she began, as if excusing herself. He shrugged his shoulders, still smiling.
+
+"Oh, Rickham found me out long ago," he said lightly; then, passing his arm through mine: "Come and see the rest of the house."
+
+He showed it to me with a kind of naive suburban pride: the bath-rooms, the speaking-tubes, the dress-closets, the trouser-presses--all the complex simplifications of the millionaire's domestic economy. And whenever my wonder paid the expected tribute he said, throwing out his chest a little: "Yes, I really don't see how people manage to live without that."
+
+Well--it was just the end one might have foreseen for him. Only he was, through it all and in spite of it all--as he had been through, and in spite of, his pictures--so handsome, so charming, so disarming, that one longed to cry out: "Be dissatisfied with your leisure!" as once one had longed to say: "Be dissatisfied with your work!"
+
+But, with the cry on my lips, my diagnosis suffered an unexpected check.
+
+"This is my own lair," he said, leading me into a dark plain room at the end of the florid vista. It was square and brown and leathery: no "effects"; no bric-a-brac, none of the air of posing for reproduction in a picture weekly--above all, no least sign of ever having been used as a studio.
+
+The fact brought home to me the absolute finality of Jack's break with his old life.
+
+"Don't you ever dabble with paint any more?" I asked, still looking about for a trace of such activity.
+
+"Never," he said briefly.
+
+"Or water-colour--or etching?"
+
+His confident eyes grew dim, and his cheeks paled a little under their handsome sunburn.
+
+"Never think of it, my dear fellow--any more than if I'd never touched a brush."
+
+And his tone told me in a flash that he never thought of anything else.
+
+I moved away, instinctively embarrassed by my unexpected discovery; and as I turned, my eye fell on a small picture above the mantel-piece--the only object breaking the plain oak panelling of the room.
+
+"Oh, by Jove!" I said.
+
+It was a sketch of a donkey--an old tired donkey, standing in the rain under a wall.
+
+"By Jove--a Stroud!" I cried.
+
+He was silent; but I felt him close behind me, breathing a little quickly.
+
+"What a wonder! Made with a dozen lines--but on everlasting foundations. You lucky chap, where did you get it?"
+
+He answered slowly: "Mrs. Stroud gave it to me."
+
+"Ah--I didn't know you even knew the Strouds. He was such an inflexible hermit."
+
+"I didn't--till after. . . . She sent for me to paint him when he was dead."
+
+"When he was dead? You?"
+
+I must have let a little too much amazement escape through my surprise, for he answered with a deprecating laugh: "Yes--she's an awful simpleton, you know, Mrs. Stroud. Her only idea was to have him done by a fashionable painter--ah, poor Stroud! She thought it the surest way of proclaiming his greatness--of forcing it on a purblind public. And at the moment I was _the_ fashionable painter."
+
+"Ah, poor Stroud--as you say. Was _that_ his history?"
+
+"That was his history. She believed in him, gloried in him--or thought she did. But she couldn't bear not to have all the drawing-rooms with her. She couldn't bear the fact that, on varnishing days, one could always get near enough to see his pictures. Poor woman! She's just a fragment groping for other fragments. Stroud is the only whole I ever knew."
+
+"You ever knew? But you just said--"
+
+Gisburn had a curious smile in his eyes.
+
+"Oh, I knew him, and he knew me--only it happened after he was dead."
+
+I dropped my voice instinctively. "When she sent for you?"
+
+"Yes--quite insensible to the irony. She wanted him vindicated--and by me!"
+
+He laughed again, and threw back his head to look up at the sketch of the donkey. "There were days when I couldn't look at that thing--couldn't face it. But I forced myself to put it here; and now it's cured me--cured me. That's the reason why I don't dabble any more, my dear Rickham; or rather Stroud himself is the reason."
+
+For the first time my idle curiosity about my companion turned into a serious desire to understand him better.
+
+"I wish you'd tell me how it happened," I said.
+
+He stood looking up at the sketch, and twirling between his fingers a cigarette he had forgotten to light. Suddenly he turned toward me.
+
+"I'd rather like to tell you--because I've always suspected you of loathing my work."
+
+I made a deprecating gesture, which he negatived with a good-humoured shrug.
+
+"Oh, I didn't care a straw when I believed in myself--and now it's an added tie between us!"
+
+He laughed slightly, without bitterness, and pushed one of the deep arm-chairs forward. "There: make yourself comfortable--and here are the cigars you like."
+
+He placed them at my elbow and continued to wander up and down the room, stopping now and then beneath the picture.
+
+"How it happened? I can tell you in five minutes--and it didn't take much longer to happen. . . . I can remember now how surprised and pleased I was when I got Mrs. Stroud's note. Of course, deep down, I had always _felt_ there was no one like him--only I had gone with the stream, echoed the usual platitudes about him, till I half got to think he was a failure, one of the kind that are left behind. By Jove, and he _was_ left behind--because he had come to stay! The rest of us had to let ourselves be swept along or go under, but he was high above the current--on everlasting foundations, as you say.
+
+"Well, I went off to the house in my most egregious mood--rather moved, Lord forgive me, at the pathos of poor Stroud's career of failure being crowned by the glory of my painting him! Of course I meant to do the picture for nothing--I told Mrs. Stroud so when she began to stammer something about her poverty. I remember getting off a prodigious phrase about the honour being _mine_--oh, I was princely, my dear Rickham! I was posing to myself like one of my own sitters.
+
+"Then I was taken up and left alone with him. I had sent all my traps in advance, and I had only to set up the easel and get to work. He had been dead only twenty-four hours, and he died suddenly, of heart disease, so that there had been no preliminary work of destruction--his face was clear and untouched. I had met him once or twice, years before, and thought him insignificant and dingy. Now I saw that he was superb.
+
+"I was glad at first, with a merely aesthetic satisfaction: glad to have my hand on such a 'subject.' Then his strange life-likeness began to affect me queerly--as I blocked the head in I felt as if he were watching me do it. The sensation was followed by the thought: if he _were_ watching me, what would he say to my way of working? My strokes began to go a little wild--I felt nervous and uncertain.
+
+"Once, when I looked up, I seemed to see a smile behind his close grayish beard--as if he had the secret, and were amusing himself by holding it back from me. That exasperated me still more. The secret? Why, I had a secret worth twenty of his! I dashed at the canvas furiously, and tried some of my bravura tricks. But they failed me, they crumbled. I saw that he wasn't watching the showy bits--I couldn't distract his attention; he just kept his eyes on the hard passages between. Those were the ones I had always shirked, or covered up with some lying paint. And how he saw through my lies!
+
+"I looked up again, and caught sight of that sketch of the donkey hanging on the wall near his bed. His wife told me afterward it was the last thing he had done--just a note taken with a shaking hand, when he was down in Devonshire recovering from a previous heart attack. Just a note! But it tells his whole history. There are years of patient scornful persistence in every line. A man who had swum with the current could never have learned that mighty up-stream stroke. . . .
+
+"I turned back to my work, and went on groping and muddling; then I looked at the donkey again. I saw that, when Stroud laid in the first stroke, he knew just what the end would be. He had possessed his subject, absorbed it, recreated it. When had I done that with any of my things? They hadn't been born of me--I had just adopted them. . . .
+
+"Hang it, Rickham, with that face watching me I couldn't do another stroke. The plain truth was, I didn't know where to put it--_I had never known_. Only, with my sitters and my public, a showy splash of colour covered up the fact--I just threw paint into their faces. . . . Well, paint was the one medium those dead eyes could see through--see straight to the tottering foundations underneath. Don't you know how, in talking a foreign language, even fluently, one says half the time not what one wants to but what one can? Well--that was the way I painted; and as he lay there and watched me, the thing they called my 'technique' collapsed like a house of cards. He didn't sneer, you understand, poor Stroud--he just lay there quietly watching, and on his lips, through the gray beard, I seemed to hear the question: 'Are you sure you know where you're coming out?'
+
+"If I could have painted that face, with that question on it, I should have done a great thing. The next greatest thing was to see that I couldn't--and that grace was given me. But, oh, at that minute, Rickham, was there anything on earth I wouldn't have given to have Stroud alive before me, and to hear him say: 'It's not too late--I'll show you how'?
+
+"It _was_ too late--it would have been, even if he'd been alive. I packed up my traps, and went down and told Mrs. Stroud. Of course I didn't tell her _that_--it would have been Greek to her. I simply said I couldn't paint him, that I was too moved. She rather liked the idea--she's so romantic! It was that that made her give me the donkey. But she was terribly upset at not getting the portrait--she did so want him 'done' by some one showy! At first I was afraid she wouldn't let me off--and at my wits' end I suggested Grindle. Yes, it was I who started Grindle: I told Mrs. Stroud he was the 'coming' man, and she told somebody else, and so it got to be true. . . . And he painted Stroud without wincing; and she hung the picture among her husband's things. . . ."
+
+He flung himself down in the arm-chair near mine, laid back his head, and clasping his arms beneath it, looked up at the picture above the chimney-piece.
+
+"I like to fancy that Stroud himself would have given it to me, if he'd been able to say what he thought that day."
+
+And, in answer to a question I put half-mechanically--"Begin again?" he flashed out. "When the one thing that brings me anywhere near him is that I knew enough to leave off?"
+
+He stood up and laid his hand on my shoulder with a laugh. "Only the irony of it is that I _am_ still painting--since Grindle's doing it for me! The Strouds stand alone, and happen once--but there's no exterminating our kind of art."
\ No newline at end of file
diff --git a/ch05/README.md b/ch05/README.md
new file mode 100644
index 0000000..6b16bde
--- /dev/null
+++ b/ch05/README.md
@@ -0,0 +1,7 @@
+# Chapter 5: 使用未标记数据进行预训练
+
+- [01_main-chapter-code](01_main-chapter-code) 主要章节代码
+- [02_alternative_weight_loading](02_alternative_weight_loading) 从其他途径下载GPT模型的代码防止OpenAI某天不开源该模型权重了
+- [03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg) 在整个Gutenberg项目语料库上进行预训练的代码
+- [04_learning_rate_schedulers](04_learning_rate_schedulers)实现更复杂的训练函数，包括学习率调整和梯度剪裁
+- [05_hparam_tuning](05_hparam_tuning) 可选的超参数调优脚本
\ No newline at end of file