Merge branch 'datawhalechina:main' into main

2026-05-03 13:02:35 +00:00 · 2024-03-03 17:25:39 +08:00
parent 8e6d32d691 11e28f039e
commit 71778822fe
2 changed files with 271 additions and 242 deletions
@@ -5,7 +5,7 @@
   "id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14",
   "metadata": {},
   "source": [
-    "# Chapter 3 Exercise solutions"
+    "# Chapter 3 习题解答"
   ]
  },
  {
@@ -13,12 +13,12 @@
   "id": "33dfa199-9aee-41d4-a64b-7e3811b9a616",
   "metadata": {},
   "source": [
-    "# Exercise 3.1"
+    "# 3.1"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 1,
   "id": "5fee2cf5-61c3-4167-81b5-44ea155bbaf2",
   "metadata": {},
   "outputs": [],
@@ -39,7 +39,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 58,
+   "execution_count": 2,
   "id": "62ea289c-41cd-4416-89dd-dde6383a6f70",
   "metadata": {},
   "outputs": [],
@@ -72,7 +72,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 59,
+   "execution_count": 3,
   "id": "7b035143-f4e8-45fb-b398-dec1bd5153d4",
   "metadata": {},
   "outputs": [],
@@ -103,7 +103,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 60,
+   "execution_count": 4,
   "id": "7591d79c-c30e-406d-adfd-20c12eb448f6",
   "metadata": {},
   "outputs": [],
@@ -115,7 +115,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 61,
+   "execution_count": 5,
   "id": "ddd0f54f-6bce-46cc-a428-17c2a56557d0",
   "metadata": {},
   "outputs": [
@@ -130,7 +130,7 @@
       "        [-0.5299, -0.1081]], grad_fn=<MmBackward0>)"
      ]
     },
-     "execution_count": 61,
+     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -141,7 +141,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 62,
+   "execution_count": 6,
   "id": "340908f8-1144-4ddd-a9e1-a1c5c3d592f5",
   "metadata": {},
   "outputs": [
@@ -156,7 +156,7 @@
       "        [-0.5299, -0.1081]], grad_fn=<MmBackward0>)"
      ]
     },
-     "execution_count": 62,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -170,15 +170,15 @@
   "id": "33543edb-46b5-4b01-8704-f7f101230544",
   "metadata": {},
   "source": [
-    "# Exercise 3.2"
+    "# 3.2"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "0588e209-1644-496a-8dae-7630b4ef9083",
+   "id": "1fc1a301",
   "metadata": {},
   "source": [
-    "If we want to have an output dimension of 2, as earlier in single-head attention, we can have to change the projection dimension `d_out` to 1:"
+    "如果我们想要多头注意力机制的输出和之前单头注意力机制一样为 2，我们可以将输出维度 `d_out` 设置为 1："
   ]
  },
  {
@@ -227,7 +227,7 @@
   "id": "92bdabcb-06cf-4576-b810-d883bbd313ba",
   "metadata": {},
   "source": [
-    "# Exercise 3.3"
+    "# 3.3"
   ]
  },
  {
@@ -249,7 +249,7 @@
   "id": "375d5290-8e8b-4149-958e-1efb58a69191",
   "metadata": {},
   "source": [
-    "Optionally, the number of parameters is as follows:"
+    "上述实现的参数量为:"
   ]
  },
  {
@@ -280,7 +280,9 @@
   "id": "a56c1d47-9b95-4bd1-a517-580a6f779c52",
   "metadata": {},
   "source": [
-    "The GPT-2 model has 117M parameters in total, but as we can see, most of its parameters are not in the multi-head attention module itself."
+    "\n",
    "\n",
    "GPT-2 模型有 117M 的参数，但正如我们所见，绝大部分参数其实都不是来源于多头注意力机制（而是线性层）。"
   ]
  }
 ],
@@ -300,7 +302,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.5"
+   "version": "3.9.18"
  }
 },
 "nbformat": 4,