Commit Graph

323 Commits

Author SHA1 Message Date
jingyaogong
9348fde743 [update] readme 2026-04-02 15:28:29 +08:00
jingyaogong
90cd275524 [update] readme 2026-04-01 14:00:21 +08:00
jingyaogong
b7e0ae21d6 [update] default model 2026-03-31 13:40:16 +08:00
jingyaogong
b1865f75c2 [update] random seed 2026-03-27 21:20:02 +08:00
jingyaogong
6b0b0c5e2f [update] fp16 inference 2026-03-27 16:29:46 +08:00
jingyaogong
88e675dc2c [update] image 2026-03-26 15:35:42 +08:00
jingyaogong
b8b3d35257 [update] change default seq_len 2026-03-26 10:09:06 +08:00
jingyaogong
101d7df2da [update] minimind-3 2026-03-25 23:57:45 +08:00
jingyaogong
83e52f6a27
Merge pull request #698 from readlnh/master
[fix] 修复训练脚本中 1-indexed step 与 0-indexed 逻辑混用的问题
2026-03-24 13:41:20 +08:00
readlnh
cf4b49a348 [fix] align log/save last-step check and ETA with 1-indexed step 2026-03-24 02:01:40 +01:00
readlnh
d25500d363 [fix] gradient accumulation step alignment 2026-03-24 01:45:04 +01:00
jingyaogong
349e74ec7b [update] empty_think_ratio 2026-02-06 19:15:21 +08:00
jingyaogong
288e1ac02a [update] empty_think_ratio 2026-02-06 01:36:02 +08:00
jingyaogong
ccc190da05 [feat] data process 2026-02-06 01:17:57 +08:00
jingyaogong
11a44340ba [update] save interval 2026-01-30 20:30:50 +08:00
jingyaogong
04616c41a5 [update] safe half 2026-01-30 20:29:31 +08:00
jingyaogong
fea69cf338 [fix] data skip 2026-01-18 16:56:29 +08:00
jingyaogong
f7ffdf1fdb [update] shuffle data 2026-01-18 16:39:34 +08:00
jingyaogong
3a5aba82db [fix] max length 2026-01-17 13:26:14 +08:00
jingyaogong
714abcf802 [update] pretrain load 2026-01-17 12:00:17 +08:00
jingyaogong
aa539a824a [update] align mask 2026-01-15 11:20:41 +08:00
jingyaogong
c090b69c4d [update] align loss 2026-01-15 00:56:32 +08:00
jingyaogong
e119db8478 [fix] compile unpack 2026-01-14 20:13:32 +08:00
jingyaogong
81d24a4f16 [feat] add compile 2026-01-14 14:42:30 +08:00
jingyaogong
1279a61681 [update] prompt prefill 2026-01-13 17:46:54 +08:00
jingyaogong
05d0b216f6 [update] show speed 2026-01-07 23:33:47 +08:00
jingyaogong
df89069362 [update] params log 2026-01-07 23:08:45 +08:00
jingyaogong
f55d4c32a0 [update] mask log 2026-01-07 22:12:26 +08:00
jingyaogong
20a43d7db0 [update] readme 2026-01-07 00:58:38 +08:00
jingyaogong
7641985d14 [update] simplify loader 2026-01-06 01:20:52 +08:00
jingyaogong
0b4a8ad4aa [update] readme 2026-01-06 01:18:10 +08:00
jingyaogong
07364c3fbe [update] rename train tokenizer 2026-01-06 01:17:33 +08:00
jingyaogong
9830915d87 [update] readme 2026-01-05 23:15:25 +08:00
jingyaogong
4e73f34823 [update] rename reason 2026-01-05 23:12:29 +08:00
jingyaogong
a8455ca8a3 [fix] messages num 2026-01-04 11:03:16 +08:00
jingyaogong
42a4e8c86a [fix] dist cleanup 2026-01-02 22:25:55 +08:00
jingyaogong
9d898576ac [update] aux loss 2026-01-01 22:41:46 +08:00
jingyaogong
c65335b56f [fix] experts unused 2025-12-31 21:47:04 +08:00
jingyaogong
bc8fd82166 [fix] layers set 8 2025-12-31 21:06:37 +08:00
jingyaogong
5dd4df7e18 [fix] moe unused 2025-12-31 21:00:06 +08:00
jingyaogong
9236260a4a [feat] get params 2025-12-31 20:46:59 +08:00
jingyaogong
288a1d7212 [feat] get params 2025-12-31 20:44:34 +08:00
jingyaogong
eead9538b2 [feat] update config 2025-12-31 10:29:13 +08:00
jingyaogong
6242980917 [feat] update lr 2025-12-31 10:27:09 +08:00
jingyaogong
936d105e9b [feat] compatible tokenizer 2025-12-31 10:26:46 +08:00
jingyaogong
4a5c9f5ece [feat] stream load data 2025-12-28 16:58:52 +08:00
jingyaogong
7eae14f3ce [feat] remove empty_cache 2025-12-27 07:14:36 +08:00
jingyaogong
11b962da06 [feat] explicit left padding 2025-12-23 18:59:48 +08:00
jingyaogong
a9c56b20e9 [fix] lora weight 2025-12-22 21:27:29 +08:00
jingyaogong
048d84abc7
Merge pull request #594 from whiteswordLI/fix/lora-load-ddp-weights
Fix: support loading DDP-saved LoRA weights for inference
2025-12-22 21:19:16 +08:00