[update] readme

This commit is contained in:
jingyaogong 2026-04-01 14:00:21 +08:00
parent b7e0ae21d6
commit 90cd275524
2 changed files with 24 additions and 14 deletions

View File

@ -1308,9 +1308,11 @@ python eval_toolcall.py --weight agent
基于`minimind-3 (64M)`在相同随机种子等超参下的主/客观对比,供参考:
**[A]** minimind-3 (64M, SFT)
**[B]** minimind-3 (64M, GRPO)
**[C]** minimind-3 (64M, Agent-CISPO)
[A] minimind-3 (64M, SFT)
[B] minimind-3 (64M, GRPO)
[C] minimind-3 (64M, Agent-CISPO)
### 测试1主观问答对比
@ -1400,10 +1402,13 @@ agent: 17/20 = 85.00%
> 注:以下对比仅为体验参考,非严格 benchmark样本量有限且带有主观性。
**[A]** minimind-3 (0.06B)
**[B]** minimind-3-moe (0.2B-A0.06B)
**[C]** [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese)
**[D]** [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese)
[A] minimind-3 (0.06B)
[B] minimind-3-moe (0.2B-A0.06B)
[C] [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese)
[D] [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese)
### 测试3问答

View File

@ -1305,9 +1305,11 @@ Let us converge back to the "**unified framework**", reorganizing the table show
Subjective/objective comparison based on `minimind-3 (64M)` under the same random seed and other hyperparameters, for reference:
**[A]** minimind-3 (64M, SFT)
**[B]** minimind-3 (64M, GRPO)
**[C]** minimind-3 (64M, Agent-CISPO)
[A] minimind-3 (64M, SFT)
[B] minimind-3 (64M, GRPO)
[C] minimind-3 (64M, Agent-CISPO)
### Test 1: Subjective Q&A Comparison
@ -1397,10 +1399,13 @@ So if the task objective is ToolUse, lightweight multi-step calling, and verifia
> Note: The following comparison is only for experiential reference, not a strict benchmark; sample size is limited and involves subjectivity.
**[A]** minimind-3 (0.06B)
**[B]** minimind-3-moe (0.2B-A0.06B)
**[C]** [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese)
**[D]** [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese)
[A] minimind-3 (0.06B)
[B] minimind-3-moe (0.2B-A0.06B)
[C] [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese)
[D] [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese)
### Test 3: Q&A