llms-from-scratch-cn/Translated_Book/ch03/3.7.ipynb
2024-05-16 22:22:49 +08:00

64 lines
2.4 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3.7 总结\n",
"\n",
"- 注意力Attention机制将输入元素转换为增强的上下文向量表示这些表示融合了所有输入的信息。\n",
"\n",
"- 自注意力Self Attention机制通过对输入的加权求和来计算上下文向量表示。\n",
"\n",
"- 在简化的注意力机制中,注意力权重通过点积计算得出。\n",
"\n",
"- 点积是将两个向量的相应元素相乘然后求和的简洁方式。\n",
"\n",
"- 虽然不是绝对必要,但矩阵乘法通过替代嵌套的 for 循环,帮助我们更高效、紧凑地实施计算。\n",
"\n",
"- 用于大语言模型的自注意力机制,也称为缩放点积注意力,其中包含了可训练的权重矩阵来计算输入的中间转换向量:查询、值和键。\n",
"\n",
"- 在处理从左到右阅读和生成文本的大语言模型时我们添加因果注意力遮蔽CausalAttention Mask以防止大语言模型访问后续的 Token 。\n",
"\n",
"- 除了使用因果注意力遮蔽将注意力权重归零外,我们还可以添加 Dropout 遮蔽来减少大语言模型中的过拟合问题。\n",
"\n",
"- 基于 Transformer 的大语言模型中的注意力模块涉及多个因果注意力CausalAttention实例这称为多头注意力MultiHeadAttention。\n",
"\n",
"- 我们可以通过堆叠多个 CausalAttention 模块来创建一个 MultiHeadAttention 模块。\n",
"\n",
"- 创建 MultiHeadAttention 模块的更有效方式涉及到批量矩阵乘法。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"trusted": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (Pyodide)",
"language": "python",
"name": "python"
},
"language_info": {
"codemirror_mode": {
"name": "python",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}