From e4807a52141ee01edf92383f680989ff3e6c5b3c Mon Sep 17 00:00:00 2001 From: jingyaogong Date: Thu, 30 Oct 2025 23:27:15 +0800 Subject: [PATCH] [feat] update readme --- README.md | 70 ++++++++++++++++++++++++++++++---------------------- README_en.md | 63 +++++++++++++++++++++++++++++----------------- 2 files changed, 80 insertions(+), 53 deletions(-) diff --git a/README.md b/README.md index 2ae0de7..6a8341c 100644 --- a/README.md +++ b/README.md @@ -126,7 +126,7 @@ ### 👉**更新日志**
- 2025-10-24 (newest🎉) + 2025-10-24 - 🔥 新增RLAIF训练算法:PPO、GRPO、SPO(从0原生实现) - 新增断点续训功能:支持训练自动恢复、跨GPU数量恢复、wandb记录连续性 @@ -179,50 +179,28 @@ MiniMind2系列旧模型均经过权重映射+(微调训练)QKVO线性层校
+
+ More... - -
- 2024-10-05 - +**2024-10-05** - 为MiniMind拓展了多模态能力之---视觉 - 移步孪生项目[minimind-v](https://github.com/jingyaogong/minimind-v)查看详情! -
- - - -
- 2024-09-27 - +**2024-09-27** - 09-27更新pretrain数据集的预处理方式,为了保证文本完整性,放弃预处理成.bin训练的形式(轻微牺牲训练速度)。 - 目前pretrain预处理后的文件命名为:pretrain_data.csv。 - 删除了一些冗余的代码。 -
- - -
- 2024-09-17 - +**2024-09-17** - 更新minimind-v1-moe模型 - 为了防止歧义,不再使用mistral_tokenizer分词,全部采用自定义的minimind_tokenizer作为分词器。 -
- - -
- 2024-09-01 - +**2024-09-01** - 更新minimind-v1 (108M)模型,采用minimind_tokenizer,预训练轮次3 + SFT轮次10,更充分训练,性能更强。 - 项目已部署至ModelScope创空间,可以在此网站上体验: - [🔗ModelScope在线体验🔗](https://www.modelscope.cn/studios/gongjy/minimind) -
- - -
- 2024-08-27 - +**2024-08-27** - 项目首次开源
@@ -1897,6 +1875,38 @@ ollama run jingyaogong/minimind2 # 其他可选 minimind2-r1 / minimind2-small / Star History Chart +## 🎉 Awesome Work using MiniMind + +本模型抛砖引玉地促成了一些可喜成果的落地,感谢研究者们的认可: + +- ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis [[arxiv](https://arxiv.org/pdf/2502.17475)] + +- Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models [[arxiv](https://arxiv.org/pdf/2502.15451)] + +- LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text [[arxiv](https://arxiv.org/pdf/2505.24826)] + +- On the Generalization Ability of Next-Token-Prediction Pretraining [[ICML 2025](https://openreview.net/forum?id=hLGJ1qZPdu)] + +- Building Large Models from Scratch: From Neural Networks to Transformer by Wang Shuang, Mou Chen, Wang Haoyi - Tsinghua University Press + +- FedBRB: A Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning [[TMC 2025](https://ieeexplore.ieee.org/abstract/document/11168259)] + +- 进行中... + + +# 🎓 Citation + +If you find MiniMind helpful in your research or work, please cite: + +```bibtex +@misc{minimind, + title={MiniMind: Train a Tiny LLM from scratch}, + author={Jingyao Gong}, + year={2024}, + howpublished={\url{https://github.com/jingyaogong/minimind}} +} +``` + # License This repository is licensed under the [Apache-2.0 License](LICENSE). diff --git a/README_en.md b/README_en.md index dbe626a..e84d9dc 100644 --- a/README_en.md +++ b/README_en.md @@ -131,7 +131,7 @@ We hope this open-source project can help LLM beginners get started quickly! ### 👉**Update Log**
- 2025-10-24 (newest🎉) + 2025-10-24 - 🔥 Added RLAIF training algorithms: PPO, GRPO, SPO (native implementation from scratch) - Added checkpoint resume training: supports automatic training recovery, cross-GPU recovery, wandb continuity @@ -184,43 +184,28 @@ After this update, maintenance of the entire minimind-v1 series will be abandone
-
- 2024-10-05 +
+ More... +**2024-10-05** - Extended MiniMind with multimodal capabilities---Vision - Check out the twin project [minimind-v](https://github.com/jingyaogong/minimind-v) for details! -
- -
- 2024-09-27 - +**2024-09-27** - 09-27 updated the preprocessing method for the pretrain dataset, abandoned preprocessing into .bin format for training to ensure text integrity (slightly sacrificing training speed). - Current pretrain preprocessing file is named: pretrain_data.csv. - Removed some redundant code. -
- -
- 2024-09-17 - +**2024-09-17** - Updated minimind-v1-moe model - To avoid ambiguity, no longer using mistral_tokenizer for tokenization, completely using custom minimind_tokenizer as the tokenizer. -
- -
- 2024-09-01 - +**2024-09-01** - Updated minimind-v1 (108M) model, using minimind_tokenizer, 3 pretraining rounds + 10 SFT rounds, more thorough training, stronger performance. - Project has been deployed to ModelScope creation space, you can experience it on this website: - [🔗ModelScope Online Experience🔗](https://www.modelscope.cn/studios/gongjy/minimind) -
- -
- 2024-08-27 - +**2024-08-27** - Project first open-sourced
@@ -1818,6 +1803,38 @@ I am a language model... Star History Chart +## 🎉 Awesome Work using MiniMind + +This model has inspired some exciting research outcomes. Thank you to all researchers for your recognition: + +- ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis [[arxiv](https://arxiv.org/pdf/2502.17475)] + +- Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models [[arxiv](https://arxiv.org/pdf/2502.15451)] + +- LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text [[arxiv](https://arxiv.org/pdf/2505.24826)] + +- On the Generalization Ability of Next-Token-Prediction Pretraining [[ICML 2025](https://openreview.net/forum?id=hLGJ1qZPdu)] + +- Building Large Models from Scratch: From Neural Networks to Transformer by Wang Shuang, Mou Chen, Wang Haoyi - Tsinghua University Press + +- FedBRB: A Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning [[TMC 2025](https://ieeexplore.ieee.org/abstract/document/11168259)] + +- Continuously... + + +# 🎓 Citation + +If you find MiniMind helpful in your research or work, please cite: + +```bibtex +@misc{minimind, + title={MiniMind: Train a Tiny LLM from scratch}, + author={Jingyao Gong}, + year={2024}, + howpublished={\url{https://github.com/jingyaogong/minimind}} +} +``` + # License This repository is licensed under the [Apache-2.0 License](LICENSE).