diff --git a/README.md b/README.md index 914a947..93091a6 100644 --- a/README.md +++ b/README.md @@ -1308,9 +1308,11 @@ python eval_toolcall.py --weight agent 基于`minimind-3 (64M)`在相同随机种子等超参下的主/客观对比,供参考: -**[A]** minimind-3 (64M, SFT) -**[B]** minimind-3 (64M, GRPO) -**[C]** minimind-3 (64M, Agent-CISPO) +[A] minimind-3 (64M, SFT) + +[B] minimind-3 (64M, GRPO) + +[C] minimind-3 (64M, Agent-CISPO) ### 测试1:主观问答对比 @@ -1400,10 +1402,13 @@ agent: 17/20 = 85.00% > 注:以下对比仅为体验参考,非严格 benchmark,样本量有限且带有主观性。 -**[A]** minimind-3 (0.06B) -**[B]** minimind-3-moe (0.2B-A0.06B) -**[C]** [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese) -**[D]** [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese) +[A] minimind-3 (0.06B) + +[B] minimind-3-moe (0.2B-A0.06B) + +[C] [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese) + +[D] [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese) ### 测试3:问答 diff --git a/README_en.md b/README_en.md index 20b0089..3b872b5 100644 --- a/README_en.md +++ b/README_en.md @@ -1305,9 +1305,11 @@ Let us converge back to the "**unified framework**", reorganizing the table show Subjective/objective comparison based on `minimind-3 (64M)` under the same random seed and other hyperparameters, for reference: -**[A]** minimind-3 (64M, SFT) -**[B]** minimind-3 (64M, GRPO) -**[C]** minimind-3 (64M, Agent-CISPO) +[A] minimind-3 (64M, SFT) + +[B] minimind-3 (64M, GRPO) + +[C] minimind-3 (64M, Agent-CISPO) ### Test 1: Subjective Q&A Comparison @@ -1397,10 +1399,13 @@ So if the task objective is ToolUse, lightweight multi-step calling, and verifia > Note: The following comparison is only for experiential reference, not a strict benchmark; sample size is limited and involves subjectivity. -**[A]** minimind-3 (0.06B) -**[B]** minimind-3-moe (0.2B-A0.06B) -**[C]** [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese) -**[D]** [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese) +[A] minimind-3 (0.06B) + +[B] minimind-3-moe (0.2B-A0.06B) + +[C] [baby-llama2-chinese (0.2B)](https://github.com/DLLXW/baby-llama2-chinese) + +[D] [chatlm-mini-chinese (0.2B)](https://github.com/charent/ChatLM-mini-Chinese) ### Test 3: Q&A