jingyaogong
|
9348fde743
|
[update] readme
|
2026-04-02 15:28:29 +08:00 |
|
jingyaogong
|
90cd275524
|
[update] readme
|
2026-04-01 14:00:21 +08:00 |
|
jingyaogong
|
b7e0ae21d6
|
[update] default model
|
2026-03-31 13:40:16 +08:00 |
|
jingyaogong
|
b1865f75c2
|
[update] random seed
|
2026-03-27 21:20:02 +08:00 |
|
jingyaogong
|
6b0b0c5e2f
|
[update] fp16 inference
|
2026-03-27 16:29:46 +08:00 |
|
jingyaogong
|
88e675dc2c
|
[update] image
|
2026-03-26 15:35:42 +08:00 |
|
jingyaogong
|
b8b3d35257
|
[update] change default seq_len
|
2026-03-26 10:09:06 +08:00 |
|
jingyaogong
|
101d7df2da
|
[update] minimind-3
|
2026-03-25 23:57:45 +08:00 |
|
jingyaogong
|
83e52f6a27
|
Merge pull request #698 from readlnh/master
[fix] 修复训练脚本中 1-indexed step 与 0-indexed 逻辑混用的问题
|
2026-03-24 13:41:20 +08:00 |
|
readlnh
|
cf4b49a348
|
[fix] align log/save last-step check and ETA with 1-indexed step
|
2026-03-24 02:01:40 +01:00 |
|
readlnh
|
d25500d363
|
[fix] gradient accumulation step alignment
|
2026-03-24 01:45:04 +01:00 |
|
jingyaogong
|
349e74ec7b
|
[update] empty_think_ratio
|
2026-02-06 19:15:21 +08:00 |
|
jingyaogong
|
288e1ac02a
|
[update] empty_think_ratio
|
2026-02-06 01:36:02 +08:00 |
|
jingyaogong
|
ccc190da05
|
[feat] data process
|
2026-02-06 01:17:57 +08:00 |
|
jingyaogong
|
11a44340ba
|
[update] save interval
|
2026-01-30 20:30:50 +08:00 |
|
jingyaogong
|
04616c41a5
|
[update] safe half
|
2026-01-30 20:29:31 +08:00 |
|
jingyaogong
|
fea69cf338
|
[fix] data skip
|
2026-01-18 16:56:29 +08:00 |
|
jingyaogong
|
f7ffdf1fdb
|
[update] shuffle data
|
2026-01-18 16:39:34 +08:00 |
|
jingyaogong
|
3a5aba82db
|
[fix] max length
|
2026-01-17 13:26:14 +08:00 |
|
jingyaogong
|
714abcf802
|
[update] pretrain load
|
2026-01-17 12:00:17 +08:00 |
|
jingyaogong
|
aa539a824a
|
[update] align mask
|
2026-01-15 11:20:41 +08:00 |
|
jingyaogong
|
c090b69c4d
|
[update] align loss
|
2026-01-15 00:56:32 +08:00 |
|
jingyaogong
|
e119db8478
|
[fix] compile unpack
|
2026-01-14 20:13:32 +08:00 |
|
jingyaogong
|
81d24a4f16
|
[feat] add compile
|
2026-01-14 14:42:30 +08:00 |
|
jingyaogong
|
1279a61681
|
[update] prompt prefill
|
2026-01-13 17:46:54 +08:00 |
|
jingyaogong
|
05d0b216f6
|
[update] show speed
|
2026-01-07 23:33:47 +08:00 |
|
jingyaogong
|
df89069362
|
[update] params log
|
2026-01-07 23:08:45 +08:00 |
|
jingyaogong
|
f55d4c32a0
|
[update] mask log
|
2026-01-07 22:12:26 +08:00 |
|
jingyaogong
|
20a43d7db0
|
[update] readme
|
2026-01-07 00:58:38 +08:00 |
|
jingyaogong
|
7641985d14
|
[update] simplify loader
|
2026-01-06 01:20:52 +08:00 |
|
jingyaogong
|
0b4a8ad4aa
|
[update] readme
|
2026-01-06 01:18:10 +08:00 |
|
jingyaogong
|
07364c3fbe
|
[update] rename train tokenizer
|
2026-01-06 01:17:33 +08:00 |
|
jingyaogong
|
9830915d87
|
[update] readme
|
2026-01-05 23:15:25 +08:00 |
|
jingyaogong
|
4e73f34823
|
[update] rename reason
|
2026-01-05 23:12:29 +08:00 |
|
jingyaogong
|
a8455ca8a3
|
[fix] messages num
|
2026-01-04 11:03:16 +08:00 |
|
jingyaogong
|
42a4e8c86a
|
[fix] dist cleanup
|
2026-01-02 22:25:55 +08:00 |
|
jingyaogong
|
9d898576ac
|
[update] aux loss
|
2026-01-01 22:41:46 +08:00 |
|
jingyaogong
|
c65335b56f
|
[fix] experts unused
|
2025-12-31 21:47:04 +08:00 |
|
jingyaogong
|
bc8fd82166
|
[fix] layers set 8
|
2025-12-31 21:06:37 +08:00 |
|
jingyaogong
|
5dd4df7e18
|
[fix] moe unused
|
2025-12-31 21:00:06 +08:00 |
|
jingyaogong
|
9236260a4a
|
[feat] get params
|
2025-12-31 20:46:59 +08:00 |
|
jingyaogong
|
288a1d7212
|
[feat] get params
|
2025-12-31 20:44:34 +08:00 |
|
jingyaogong
|
eead9538b2
|
[feat] update config
|
2025-12-31 10:29:13 +08:00 |
|
jingyaogong
|
6242980917
|
[feat] update lr
|
2025-12-31 10:27:09 +08:00 |
|
jingyaogong
|
936d105e9b
|
[feat] compatible tokenizer
|
2025-12-31 10:26:46 +08:00 |
|
jingyaogong
|
4a5c9f5ece
|
[feat] stream load data
|
2025-12-28 16:58:52 +08:00 |
|
jingyaogong
|
7eae14f3ce
|
[feat] remove empty_cache
|
2025-12-27 07:14:36 +08:00 |
|
jingyaogong
|
11b962da06
|
[feat] explicit left padding
|
2025-12-23 18:59:48 +08:00 |
|
jingyaogong
|
a9c56b20e9
|
[fix] lora weight
|
2025-12-22 21:27:29 +08:00 |
|
jingyaogong
|
048d84abc7
|
Merge pull request #594 from whiteswordLI/fix/lora-load-ddp-weights
Fix: support loading DDP-saved LoRA weights for inference
|
2025-12-22 21:19:16 +08:00 |
|