Commit Graph

299 Commits

Author SHA1 Message Date
jingyaogong
1279a61681 [update] prompt prefill 2026-01-13 17:46:54 +08:00
jingyaogong
05d0b216f6 [update] show speed 2026-01-07 23:33:47 +08:00
jingyaogong
df89069362 [update] params log 2026-01-07 23:08:45 +08:00
jingyaogong
f55d4c32a0 [update] mask log 2026-01-07 22:12:26 +08:00
jingyaogong
20a43d7db0 [update] readme 2026-01-07 00:58:38 +08:00
jingyaogong
7641985d14 [update] simplify loader 2026-01-06 01:20:52 +08:00
jingyaogong
0b4a8ad4aa [update] readme 2026-01-06 01:18:10 +08:00
jingyaogong
07364c3fbe [update] rename train tokenizer 2026-01-06 01:17:33 +08:00
jingyaogong
9830915d87 [update] readme 2026-01-05 23:15:25 +08:00
jingyaogong
4e73f34823 [update] rename reason 2026-01-05 23:12:29 +08:00
jingyaogong
a8455ca8a3 [fix] messages num 2026-01-04 11:03:16 +08:00
jingyaogong
42a4e8c86a [fix] dist cleanup 2026-01-02 22:25:55 +08:00
jingyaogong
9d898576ac [update] aux loss 2026-01-01 22:41:46 +08:00
jingyaogong
c65335b56f [fix] experts unused 2025-12-31 21:47:04 +08:00
jingyaogong
bc8fd82166 [fix] layers set 8 2025-12-31 21:06:37 +08:00
jingyaogong
5dd4df7e18 [fix] moe unused 2025-12-31 21:00:06 +08:00
jingyaogong
9236260a4a [feat] get params 2025-12-31 20:46:59 +08:00
jingyaogong
288a1d7212 [feat] get params 2025-12-31 20:44:34 +08:00
jingyaogong
eead9538b2 [feat] update config 2025-12-31 10:29:13 +08:00
jingyaogong
6242980917 [feat] update lr 2025-12-31 10:27:09 +08:00
jingyaogong
936d105e9b [feat] compatible tokenizer 2025-12-31 10:26:46 +08:00
jingyaogong
4a5c9f5ece [feat] stream load data 2025-12-28 16:58:52 +08:00
jingyaogong
7eae14f3ce [feat] remove empty_cache 2025-12-27 07:14:36 +08:00
jingyaogong
11b962da06 [feat] explicit left padding 2025-12-23 18:59:48 +08:00
jingyaogong
a9c56b20e9 [fix] lora weight 2025-12-22 21:27:29 +08:00
jingyaogong
048d84abc7
Merge pull request #594 from whiteswordLI/fix/lora-load-ddp-weights
Fix: support loading DDP-saved LoRA weights for inference
2025-12-22 21:19:16 +08:00
whitesword
3a18fdd666 Fix: support loading DDP-saved LoRA weights for inference 2025-12-22 20:50:25 +08:00
jingyaogong
fe24501602 [feat] adjust seq length 2025-12-14 20:41:58 +08:00
jingyaogong
fa82707c9c [feat] update readme 2025-12-11 15:45:50 +08:00
jingyaogong
5129f0e2a2 [fix] dtype & lr 2025-12-09 13:01:38 +08:00
jingyaogong
aa7dc0f61e
Merge pull request #571 from dyhuachi/dyhuachi-patch-1
[fix] Refactor get_lr function to include min_lr calculation
2025-12-09 12:59:11 +08:00
dyhuachi
bf3878ace8
[fix] Refactor get_lr function to include min_lr calculation
这里的退火算法会让参数里的lr的起始值变成原来lr的1.1倍,作出如下修改
2025-12-06 17:09:51 +08:00
jingyaogong
ecd1ae1563 [fix] reduce aux_loss_alpha 2025-12-05 23:08:29 +08:00
jingyaogong
5e1447b913 [fix] cuda memory #559 2025-12-01 16:17:43 +08:00
jingyaogong
151fdf7e76 [feat] update yarn 2025-12-01 16:15:05 +08:00
jingyaogong
6b86ea399a [feat] release memory 2025-11-27 19:39:49 +08:00
jingyaogong
d7f4f4eab8 [fix] ppo mask 2025-11-19 23:39:02 +08:00
jingyaogong
f5374dc87f [fix] model attn_mask 2025-11-19 22:26:53 +08:00
jingyaogong
a044578d73 [fix] update model 2025-11-18 13:07:20 +08:00
jingyaogong
ce9394670b
Merge pull request #536 from yuyu5333/fix/attn_forward
fix: attn_forwad when is_causal=True assert attn_mask is None
2025-11-18 13:02:46 +08:00
yuyu5333
7d02ce673c fix: attn_forwad when is_causal=True assert attn_mask is None 2025-11-18 03:17:17 +00:00
jingyaogong
9c98cabc9a [fix] prompt length calculate 2025-11-15 18:25:37 +08:00
jingyaogong
f3441b0078
Merge pull request #528 from wangzhaode/feat/add_mnn_support
[feat] add MNN support to README.
2025-11-10 22:46:15 +08:00
yanxing
5959396096 [feat] add MNN support to README. 2025-11-10 21:59:22 +08:00
jingyaogong
bf60bde8fb [fix] model-name 2025-11-07 19:38:20 +08:00
jingyaogong
81e869fc3e [fix] harmonize template 2025-11-06 13:14:08 +08:00
jingyaogong
509d8dacf1 [feat] clear cache 2025-11-06 13:12:28 +08:00
jingyaogong
8a0b04ed82 [fix] harmonize template 2025-11-02 23:18:11 +08:00
jingyaogong
0323815729 [feat] update import 2025-10-31 23:45:55 +08:00
jingyaogong
8d71754e05 [feat] update readme 2025-10-30 23:39:25 +08:00