Commit Graph

351 Commits

Author SHA1 Message Date
jingyaogong dddedc6881 [fix] repetition_penalty 2026-05-07 19:08:52 +08:00
jingyaogong 802c15b2b4 [feat] reduce RL memory 2026-05-06 15:07:28 +08:00
jingyaogong e73a407f7a Merge pull request #759 from TKiteRunner/fix/grpo-oom-reorder-rewards
fix: resolve CUDA OOM in train_grpo.py on GPUs with <=8GB VRAM
2026-05-06 15:00:17 +08:00
TKiteRunner 10776417aa fix: resolve CUDA OOM in train_grpo.py on GPUs with <=8GB VRAM 2026-05-06 14:14:15 +08:00
jingyaogong 5020dc9dd4 [update] readme 2026-05-06 13:41:46 +08:00
jingyaogong bdee223036 [fix] inference bug 2026-05-03 20:48:48 +08:00
jingyaogong 06d882e4ef [update] readme 2026-05-01 12:06:47 +08:00
jingyaogong da865af63d [update] readme 2026-04-28 17:22:25 +08:00
jingyaogong 773e451b11 [fix] bugs 2026-04-27 19:16:08 +08:00
jingyaogong 6361510016 [fix] rollout bugs 2026-04-27 17:54:09 +08:00
jingyaogong d4c6bc5c7e [update] readme 2026-04-27 10:59:41 +08:00
jingyaogong 24896cd2c4 [update] readme 2026-04-24 20:16:06 +08:00
jingyaogong e2fc397176 [update] readme 2026-04-24 20:14:10 +08:00
jingyaogong 693fb1ccf1 [update] readme 2026-04-21 14:34:46 +08:00
jingyaogong 5416a44471 [fix] bugs 2026-04-21 13:03:34 +08:00
jingyaogong 1718e9a44d [fix] transformers-5.x 2026-04-19 23:48:54 +08:00
jingyaogong 5704766352 [update] tie embedding 2026-04-19 21:57:28 +08:00
jingyaogong 1ea113ea2c [update] readme 2026-04-19 14:51:47 +08:00
jingyaogong 487f78754d [update] readme 2026-04-10 10:55:24 +08:00
jingyaogong e796b8028a [fix] lora compile 2026-04-10 00:01:21 +08:00
jingyaogong b2488e6440 [update] readme 2026-04-09 19:00:41 +08:00
jingyaogong 939dc8ff42 [update] readme 2026-04-09 18:54:17 +08:00
jingyaogong cadacabecb [update] bench 2026-04-09 17:12:48 +08:00
jingyaogong 5351424bf0 [fix] lora moe 2026-04-09 16:36:48 +08:00
jingyaogong aa3e6affa1 [update] add a comment 2026-04-09 15:09:58 +08:00
jingyaogong 25a7edcd6f [update] image 2026-04-04 14:54:13 +08:00
jingyaogong cacf1d4cd0 [update] readme 2026-04-04 11:25:21 +08:00
jingyaogong 2ab6455d9d [update] open causal 2026-04-02 15:28:58 +08:00
jingyaogong 9348fde743 [update] readme 2026-04-02 15:28:29 +08:00
jingyaogong 90cd275524 [update] readme 2026-04-01 14:00:21 +08:00
jingyaogong b7e0ae21d6 [update] default model 2026-03-31 13:40:16 +08:00
jingyaogong b1865f75c2 [update] random seed 2026-03-27 21:20:02 +08:00
jingyaogong 6b0b0c5e2f [update] fp16 inference 2026-03-27 16:29:46 +08:00
jingyaogong 88e675dc2c [update] image 2026-03-26 15:35:42 +08:00
jingyaogong b8b3d35257 [update] change default seq_len 2026-03-26 10:09:06 +08:00
jingyaogong 101d7df2da [update] minimind-3 2026-03-25 23:57:45 +08:00
jingyaogong 83e52f6a27 Merge pull request #698 from readlnh/master
[fix] 修复训练脚本中 1-indexed step 与 0-indexed 逻辑混用的问题
2026-03-24 13:41:20 +08:00
readlnh cf4b49a348 [fix] align log/save last-step check and ETA with 1-indexed step 2026-03-24 02:01:40 +01:00
readlnh d25500d363 [fix] gradient accumulation step alignment 2026-03-24 01:45:04 +01:00
jingyaogong 349e74ec7b [update] empty_think_ratio 2026-02-06 19:15:21 +08:00
jingyaogong 288e1ac02a [update] empty_think_ratio 2026-02-06 01:36:02 +08:00
jingyaogong ccc190da05 [feat] data process 2026-02-06 01:17:57 +08:00
jingyaogong 11a44340ba [update] save interval 2026-01-30 20:30:50 +08:00
jingyaogong 04616c41a5 [update] safe half 2026-01-30 20:29:31 +08:00
jingyaogong fea69cf338 [fix] data skip 2026-01-18 16:56:29 +08:00
jingyaogong f7ffdf1fdb [update] shuffle data 2026-01-18 16:39:34 +08:00
jingyaogong 3a5aba82db [fix] max length 2026-01-17 13:26:14 +08:00
jingyaogong 714abcf802 [update] pretrain load 2026-01-17 12:00:17 +08:00
jingyaogong aa539a824a [update] align mask 2026-01-15 11:20:41 +08:00
jingyaogong c090b69c4d [update] align loss 2026-01-15 00:56:32 +08:00