From da865af63db58c70d4d72242bf3d96277667dff4 Mon Sep 17 00:00:00 2001 From: jingyaogong Date: Tue, 28 Apr 2026 17:22:25 +0800 Subject: [PATCH] [update] readme --- README.md | 8 ++++---- README_en.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index ca04e75..eaf4e18 100644 --- a/README.md +++ b/README.md @@ -101,7 +101,7 @@ | 模型 | 参数量 | Release | |------|--------|---------| | minimind-3 | 64M | 2026.04.01 | -| minimind-3-moe | 198M / A64M | 2026.04.01 | +| minimind-3-moe | 198M-A64M | 2026.04.01 | | minimind2-small | 26M | 2025.04.26 | | minimind2-moe | 145M | 2025.04.26 | | minimind2 | 104M | 2025.04.26 | @@ -118,7 +118,7 @@ 🔥 2026-04-01 - 发布 `minimind-3` / `minimind-3-moe`:结构、Tokenizer、训练链路、推理接口与默认配置全面更新 -- 结构主线对齐 `Qwen3 / Qwen3-MoE` 生态:Dense 约 `64M`,MoE 约 `198M / A64M`,并移除了 shared expert 设计 +- 结构主线对齐 `Qwen3 / Qwen3-MoE` 生态:Dense 约 `64M`,MoE 约 `198M-A64M`,并移除了 shared expert 设计 - 默认训练数据切换为 `pretrain_t2t(_mini).jsonl`、`sft_t2t(_mini).jsonl`、`rlaif.jsonl`、`agent_rl.jsonl` 与 `agent_rl_math.jsonl` - 移除独立 `train_reason.py`;思考能力统一由 `chat_template + ` 与 `open_thinking` 自适应开关控制 - `toolcall` 能力已混入 `sft_t2t / sft_t2t_mini` 主线数据,默认 `full_sft` 即具备基础 Tool Call 能力;同时新增 `scripts/chat_api.py` 等推理示例 @@ -575,7 +575,7 @@ MiniMind训练数据集下载地址: [ModelScope](https://www.modelscope.cn/da | Model Name | params | len_vocab | max_pos | rope_theta | n_layers | d_model | kv_heads | q_heads | note | |------------|--------|-----------|---------|------------|----------|---------|----------|---------|------| | minimind-3 | 64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | Dense | -| minimind-3-moe | 198M / A64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | 4 experts / top-1 | +| minimind-3-moe | 198M-A64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | 4 experts / top-1 | | minimind2-small | 26M | 6400 | 32768 | 1e6 | 8 | 512 | 2 | 8 | 历史版本 | | minimind2-moe | 145M | 6400 | 32768 | 1e6 | 8 | 640 | 2 | 8 | 历史版本 | | minimind2 | 104M | 6400 | 32768 | 1e6 | 16 | 768 | 2 | 8 | 历史版本 | @@ -621,7 +621,7 @@ MobileLLM 的一个核心观察是:在参数量固定时,深度往往比宽 | Model Name | params | pretrain_t2t_mini | sft_t2t_mini | toolcall | RLAIF | |------------|--------|-------------------|--------------|----------|-------| | minimind-3 | 64M | ≈1.21h
≈1.57¥ | ≈1.10h
≈1.43¥ | ≈0.9h
≈1.17¥ | ≈1.1h
≈1.43¥ | -| minimind-3-moe | 198M / A64M | ≈1.69h
≈2.20¥ | ≈1.54h
≈2.00¥ | ≈1.26h
≈1.64¥ | ≈1.54h
≈2.00¥ | +| minimind-3-moe | 198M-A64M | ≈1.69h
≈2.20¥ | ≈1.54h
≈2.00¥ | ≈1.26h
≈1.64¥ | ≈1.54h
≈2.00¥ | --- diff --git a/README_en.md b/README_en.md index a3d1535..784a303 100644 --- a/README_en.md +++ b/README_en.md @@ -101,7 +101,7 @@ Meanwhile, third-party large model frameworks and tool libraries, such as `trans | Model | Parameters | Release | |------|--------|---------| | minimind-3 | 64M | 2026.04.01 | -| minimind-3-moe | 198M / A64M | 2026.04.01 | +| minimind-3-moe | 198M-A64M | 2026.04.01 | | minimind2-small | 26M | 2025.04.26 | | minimind2-moe | 145M | 2025.04.26 | | minimind2 | 104M | 2025.04.26 | @@ -117,7 +117,7 @@ Meanwhile, third-party large model frameworks and tool libraries, such as `trans 🔥 2026-04-01 - Released `minimind-3` / `minimind-3-moe`: comprehensive updates to structure, Tokenizer, training pipeline, inference interface, and default configuration -- Main branch structure aligned with `Qwen3 / Qwen3-MoE` ecosystem: Dense approximately `64M`, MoE approximately `198M / A64M`, and removed shared expert design +- Main branch structure aligned with `Qwen3 / Qwen3-MoE` ecosystem: Dense approximately `64M`, MoE approximately `198M-A64M`, and removed shared expert design - Default training data switched to `pretrain_t2t(_mini).jsonl`, `sft_t2t(_mini).jsonl`, `rlaif.jsonl`, `agent_rl.jsonl`, and `agent_rl_math.jsonl` - Removed standalone `train_reason.py`; thinking capability is now unified through `chat_template + ` and `open_thinking` adaptive switch control - `toolcall` capability has been merged into `sft_t2t / sft_t2t_mini` main branch data, default `full_sft` already has basic Tool Call capability; also added inference examples such as `scripts/chat_api.py` @@ -574,7 +574,7 @@ To modify model configuration, see [./model/model_minimind.py](./model/model_min | Model Name | params | len_vocab | max_pos | rope_theta | n_layers | d_model | kv_heads | q_heads | note | |------------|--------|-----------|---------|------------|----------|---------|----------|---------|------| | minimind-3 | 64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | Dense | -| minimind-3-moe | 198M / A64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | 4 experts / top-1 | +| minimind-3-moe | 198M-A64M | 6400 | 32768 | 1e6 | 8 | 768 | 4 | 8 | 4 experts / top-1 | | minimind2-small | 26M | 6400 | 32768 | 1e6 | 8 | 512 | 2 | 8 | Historical version | | minimind2-moe | 145M | 6400 | 32768 | 1e6 | 8 | 640 | 2 | 8 | Historical version | | minimind2 | 104M | 6400 | 32768 | 1e6 | 16 | 768 | 2 | 8 | Historical version | @@ -620,7 +620,7 @@ For reference, GPT-3's parameter settings are as follows: | Model Name | params | pretrain_t2t_mini | sft_t2t_mini | toolcall | RLAIF | |------------|--------|-------------------|--------------|----------|-------| | minimind-3 | 64M | ≈1.21h
≈1.57¥ | ≈1.10h
≈1.43¥ | ≈0.9h
≈1.17¥ | ≈1.1h
≈1.43¥ | -| minimind-3-moe | 198M / A64M | ≈1.69h
≈2.20¥ | ≈1.54h
≈2.00¥ | ≈1.26h
≈1.64¥ | ≈1.54h
≈2.00¥ | +| minimind-3-moe | 198M-A64M | ≈1.69h
≈2.20¥ | ≈1.54h
≈2.00¥ | ≈1.26h
≈1.64¥ | ≈1.54h
≈2.00¥ | ---