update readme

This commit is contained in:
jingyaogong 2025-04-26 10:21:34 +08:00
parent 274483cb1b
commit 00d145c481
2 changed files with 5 additions and 8 deletions

View File

@ -130,7 +130,6 @@
- generate方式重构继承自GenerationMixin类。
- 🔥支持llama.cpp、vllm、ollama等热门三方生态。
- 规范代码和目录结构。
- 🔥更新从0实现PPO、GRPO的训练代码。
- 改动词表`<s></s>`->`<|im_start|><|im_end|>`
```text
为兼容第三方推理框架llama.cpp、vllm本次更新需付出一些可观代价。
@ -510,12 +509,12 @@ quality当然也还不算high提升数据质量无止尽
---
## Ⅷ 数据集下载
## Ⅷ MiniMind训练数据集
> [!NOTE]
> 2025-02-05后开源MiniMind最终训练所用的所有数据集因此无需再自行预处理大规模数据集避免重复性的数据处理工作。
MiniMind训练数据集 ([ModelScope](https://www.modelscope.cn/datasets/gongjy/minimind_dataset/files) | [HuggingFace](https://huggingface.co/datasets/jingyaogong/minimind_dataset/tree/main))
MiniMind训练数据集下载地址: [HuggingFace](https://huggingface.co/datasets/jingyaogong/minimind_dataset/tree/main)
> 无需全部clone可单独下载所需的文件

View File

@ -145,9 +145,7 @@ We hope this open-source project can help LLM beginners quickly get started!
• 🔥 Support for popular third-party ecosystems like llama.cpp, vllm, and ollama.
• Standardized code and directory structure.
• 🔥 New: Added training code for PPO and GRPO from scratch.
• Standardized code and directory structure.
• Updated vocabulary tokens: `<s></s>``<|im_start|><|im_end|>`.
@ -559,7 +557,7 @@ Big respect!
---
## Ⅷ Dataset Download
## Ⅷ MiniMind Dataset Download
> [!NOTE]
> After `2025-02-05`, MiniMinds open-source datasets for final training are provided, so there is no need for
@ -567,7 +565,7 @@ Big respect!
MiniMind Training Datasets are available for download from:
Dataset ([ModelScope](https://www.modelscope.cn/datasets/gongjy/minimind_dataset/files) | [HuggingFace](https://huggingface.co/datasets/jingyaogong/minimind_dataset/tree/main))
MiniMind Dataset ([HuggingFace](https://huggingface.co/datasets/jingyaogong/minimind_dataset/tree/main))
> You dont need to clone everything, just download the necessary files.