mirror of
https://github.com/datawhalechina/llms-from-scratch-cn.git
synced 2026-05-01 11:58:17 +08:00
64 lines
2.4 KiB
Plaintext
64 lines
2.4 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 3.7 总结\n",
|
||
"\n",
|
||
"- 注意力(Attention)机制将输入元素转换为增强的上下文向量表示,这些表示融合了所有输入的信息。\n",
|
||
"\n",
|
||
"- 自注意力(Self Attention)机制通过对输入的加权求和来计算上下文向量表示。\n",
|
||
"\n",
|
||
"- 在简化的注意力机制中,注意力权重通过点积计算得出。\n",
|
||
"\n",
|
||
"- 点积是将两个向量的相应元素相乘然后求和的简洁方式。\n",
|
||
"\n",
|
||
"- 虽然不是绝对必要,但矩阵乘法通过替代嵌套的 for 循环,帮助我们更高效、紧凑地实施计算。\n",
|
||
"\n",
|
||
"- 用于大语言模型的自注意力机制,也称为缩放点积注意力,其中包含了可训练的权重矩阵来计算输入的中间转换向量:查询、值和键。\n",
|
||
"\n",
|
||
"- 在处理从左到右阅读和生成文本的大语言模型时,我们添加因果注意力遮蔽(CausalAttention Mask)以防止大语言模型访问后续的 Token 。\n",
|
||
"\n",
|
||
"- 除了使用因果注意力遮蔽将注意力权重归零外,我们还可以添加 Dropout 遮蔽来减少大语言模型中的过拟合问题。\n",
|
||
"\n",
|
||
"- 基于 Transformer 的大语言模型中的注意力模块涉及多个因果注意力(CausalAttention)实例,这称为多头注意力(MultiHeadAttention)。\n",
|
||
"\n",
|
||
"- 我们可以通过堆叠多个 CausalAttention 模块来创建一个 MultiHeadAttention 模块。\n",
|
||
"\n",
|
||
"- 创建 MultiHeadAttention 模块的更有效方式涉及到批量矩阵乘法。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"trusted": true
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python (Pyodide)",
|
||
"language": "python",
|
||
"name": "python"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "python",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.8"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|