jingyaogong
|
1279a61681
|
[update] prompt prefill
|
2026-01-13 17:46:54 +08:00 |
|
jingyaogong
|
05d0b216f6
|
[update] show speed
|
2026-01-07 23:33:47 +08:00 |
|
jingyaogong
|
df89069362
|
[update] params log
|
2026-01-07 23:08:45 +08:00 |
|
jingyaogong
|
f55d4c32a0
|
[update] mask log
|
2026-01-07 22:12:26 +08:00 |
|
jingyaogong
|
20a43d7db0
|
[update] readme
|
2026-01-07 00:58:38 +08:00 |
|
jingyaogong
|
7641985d14
|
[update] simplify loader
|
2026-01-06 01:20:52 +08:00 |
|
jingyaogong
|
0b4a8ad4aa
|
[update] readme
|
2026-01-06 01:18:10 +08:00 |
|
jingyaogong
|
07364c3fbe
|
[update] rename train tokenizer
|
2026-01-06 01:17:33 +08:00 |
|
jingyaogong
|
9830915d87
|
[update] readme
|
2026-01-05 23:15:25 +08:00 |
|
jingyaogong
|
4e73f34823
|
[update] rename reason
|
2026-01-05 23:12:29 +08:00 |
|
jingyaogong
|
a8455ca8a3
|
[fix] messages num
|
2026-01-04 11:03:16 +08:00 |
|
jingyaogong
|
42a4e8c86a
|
[fix] dist cleanup
|
2026-01-02 22:25:55 +08:00 |
|
jingyaogong
|
9d898576ac
|
[update] aux loss
|
2026-01-01 22:41:46 +08:00 |
|
jingyaogong
|
c65335b56f
|
[fix] experts unused
|
2025-12-31 21:47:04 +08:00 |
|
jingyaogong
|
bc8fd82166
|
[fix] layers set 8
|
2025-12-31 21:06:37 +08:00 |
|
jingyaogong
|
5dd4df7e18
|
[fix] moe unused
|
2025-12-31 21:00:06 +08:00 |
|
jingyaogong
|
9236260a4a
|
[feat] get params
|
2025-12-31 20:46:59 +08:00 |
|
jingyaogong
|
288a1d7212
|
[feat] get params
|
2025-12-31 20:44:34 +08:00 |
|
jingyaogong
|
eead9538b2
|
[feat] update config
|
2025-12-31 10:29:13 +08:00 |
|
jingyaogong
|
6242980917
|
[feat] update lr
|
2025-12-31 10:27:09 +08:00 |
|
jingyaogong
|
936d105e9b
|
[feat] compatible tokenizer
|
2025-12-31 10:26:46 +08:00 |
|
jingyaogong
|
4a5c9f5ece
|
[feat] stream load data
|
2025-12-28 16:58:52 +08:00 |
|
jingyaogong
|
7eae14f3ce
|
[feat] remove empty_cache
|
2025-12-27 07:14:36 +08:00 |
|
jingyaogong
|
11b962da06
|
[feat] explicit left padding
|
2025-12-23 18:59:48 +08:00 |
|
jingyaogong
|
a9c56b20e9
|
[fix] lora weight
|
2025-12-22 21:27:29 +08:00 |
|
jingyaogong
|
048d84abc7
|
Merge pull request #594 from whiteswordLI/fix/lora-load-ddp-weights
Fix: support loading DDP-saved LoRA weights for inference
|
2025-12-22 21:19:16 +08:00 |
|
whitesword
|
3a18fdd666
|
Fix: support loading DDP-saved LoRA weights for inference
|
2025-12-22 20:50:25 +08:00 |
|
jingyaogong
|
fe24501602
|
[feat] adjust seq length
|
2025-12-14 20:41:58 +08:00 |
|
jingyaogong
|
fa82707c9c
|
[feat] update readme
|
2025-12-11 15:45:50 +08:00 |
|
jingyaogong
|
5129f0e2a2
|
[fix] dtype & lr
|
2025-12-09 13:01:38 +08:00 |
|
jingyaogong
|
aa7dc0f61e
|
Merge pull request #571 from dyhuachi/dyhuachi-patch-1
[fix] Refactor get_lr function to include min_lr calculation
|
2025-12-09 12:59:11 +08:00 |
|
dyhuachi
|
bf3878ace8
|
[fix] Refactor get_lr function to include min_lr calculation
这里的退火算法会让参数里的lr的起始值变成原来lr的1.1倍,作出如下修改
|
2025-12-06 17:09:51 +08:00 |
|
jingyaogong
|
ecd1ae1563
|
[fix] reduce aux_loss_alpha
|
2025-12-05 23:08:29 +08:00 |
|
jingyaogong
|
5e1447b913
|
[fix] cuda memory #559
|
2025-12-01 16:17:43 +08:00 |
|
jingyaogong
|
151fdf7e76
|
[feat] update yarn
|
2025-12-01 16:15:05 +08:00 |
|
jingyaogong
|
6b86ea399a
|
[feat] release memory
|
2025-11-27 19:39:49 +08:00 |
|
jingyaogong
|
d7f4f4eab8
|
[fix] ppo mask
|
2025-11-19 23:39:02 +08:00 |
|
jingyaogong
|
f5374dc87f
|
[fix] model attn_mask
|
2025-11-19 22:26:53 +08:00 |
|
jingyaogong
|
a044578d73
|
[fix] update model
|
2025-11-18 13:07:20 +08:00 |
|
jingyaogong
|
ce9394670b
|
Merge pull request #536 from yuyu5333/fix/attn_forward
fix: attn_forwad when is_causal=True assert attn_mask is None
|
2025-11-18 13:02:46 +08:00 |
|
yuyu5333
|
7d02ce673c
|
fix: attn_forwad when is_causal=True assert attn_mask is None
|
2025-11-18 03:17:17 +00:00 |
|
jingyaogong
|
9c98cabc9a
|
[fix] prompt length calculate
|
2025-11-15 18:25:37 +08:00 |
|
jingyaogong
|
f3441b0078
|
Merge pull request #528 from wangzhaode/feat/add_mnn_support
[feat] add MNN support to README.
|
2025-11-10 22:46:15 +08:00 |
|
yanxing
|
5959396096
|
[feat] add MNN support to README.
|
2025-11-10 21:59:22 +08:00 |
|
jingyaogong
|
bf60bde8fb
|
[fix] model-name
|
2025-11-07 19:38:20 +08:00 |
|
jingyaogong
|
81e869fc3e
|
[fix] harmonize template
|
2025-11-06 13:14:08 +08:00 |
|
jingyaogong
|
509d8dacf1
|
[feat] clear cache
|
2025-11-06 13:12:28 +08:00 |
|
jingyaogong
|
8a0b04ed82
|
[fix] harmonize template
|
2025-11-02 23:18:11 +08:00 |
|
jingyaogong
|
0323815729
|
[feat] update import
|
2025-10-31 23:45:55 +08:00 |
|
jingyaogong
|
8d71754e05
|
[feat] update readme
|
2025-10-30 23:39:25 +08:00 |
|