jingyaogong
|
1279a61681
|
[update] prompt prefill
|
2026-01-13 17:46:54 +08:00 |
|
jingyaogong
|
9d898576ac
|
[update] aux loss
|
2026-01-01 22:41:46 +08:00 |
|
jingyaogong
|
c65335b56f
|
[fix] experts unused
|
2025-12-31 21:47:04 +08:00 |
|
jingyaogong
|
a9c56b20e9
|
[fix] lora weight
|
2025-12-22 21:27:29 +08:00 |
|
whitesword
|
3a18fdd666
|
Fix: support loading DDP-saved LoRA weights for inference
|
2025-12-22 20:50:25 +08:00 |
|
jingyaogong
|
5129f0e2a2
|
[fix] dtype & lr
|
2025-12-09 13:01:38 +08:00 |
|
jingyaogong
|
ecd1ae1563
|
[fix] reduce aux_loss_alpha
|
2025-12-05 23:08:29 +08:00 |
|
jingyaogong
|
151fdf7e76
|
[feat] update yarn
|
2025-12-01 16:15:05 +08:00 |
|
jingyaogong
|
6b86ea399a
|
[feat] release memory
|
2025-11-27 19:39:49 +08:00 |
|
jingyaogong
|
f5374dc87f
|
[fix] model attn_mask
|
2025-11-19 22:26:53 +08:00 |
|
jingyaogong
|
a044578d73
|
[fix] update model
|
2025-11-18 13:07:20 +08:00 |
|
yuyu5333
|
7d02ce673c
|
fix: attn_forwad when is_causal=True assert attn_mask is None
|
2025-11-18 03:17:17 +00:00 |
|
jingyaogong
|
4014e62cdf
|
[fix] restore
|
2025-10-23 19:00:06 +08:00 |
|
jingyaogong
|
557bcc018d
|
[fix] issue-431
|
2025-10-23 18:56:30 +08:00 |
|
jingyaogong
|
4e35fb9da8
|
[fix] update model
|
2025-10-17 00:09:32 +08:00 |
|
jingyaogong
|
5ffde04b7c
|
update lora
|
2025-04-27 15:45:06 +08:00 |
|
jingyaogong
|
29454c31af
|
fix bugs
|
2025-04-27 09:56:49 +08:00 |
|
jingyaogong
|
274483cb1b
|
250426
|
2025-04-26 10:07:55 +08:00 |
|
jingyaogong
|
a62faf34bd
|
250426
|
2025-04-26 10:05:47 +08:00 |
|
jingyaogong
|
d9453ed9a3
|
update moe note
|
2025-04-09 17:38:31 +08:00 |
|
jingyaogong
|
4a7c1c49e8
|
update rlaif
|
2025-04-05 16:06:08 +08:00 |
|
jingyaogong
|
9e67798397
|
update generate
|
2025-04-05 15:53:55 +08:00 |
|
jingyaogong
|
399d526fbd
|
add hidden state
|
2025-04-05 14:39:56 +08:00 |
|
jingyaogong
|
ed01c5d84a
|
update inference
|
2025-04-05 12:03:04 +08:00 |
|
jingyaogong
|
bf81fd5f5e
|
rmsnorm float convert
|
2025-04-01 16:03:44 +08:00 |
|
jingyaogong
|
e369b33265
|
fix chat mask bug
|
2025-04-01 13:44:55 +08:00 |
|
jingyaogong
|
258507ff89
|
delete __pycache__
|
2025-04-01 11:51:54 +08:00 |
|
gongjy
|
844e79148c
|
update generate args
|
2025-02-15 23:56:09 +08:00 |
|
gongjy
|
19b388cd87
|
update generate args
|
2025-02-15 23:55:10 +08:00 |
|
gongjy
|
5b65bc767e
|
update cis init
|
2025-02-15 20:26:34 +08:00 |
|
gongjy
|
58e3af0359
|
add minimind2
|
2025-02-09 23:49:47 +08:00 |
|
gongjy
|
3ff66f7221
|
update model
|
2024-10-20 15:13:58 +08:00 |
|
gongjy
|
772834148e
|
update readme
|
2024-10-08 23:40:29 +08:00 |
|
gongjy
|
a87f628400
|
update model (fix loss bug)
|
2024-09-29 16:58:48 +08:00 |
|
gongjy
|
75753ea765
|
Update data preprocessing methods
|
2024-09-27 17:19:03 +08:00 |
|
gongjy
|
a8ae342775
|
Update data preprocessing methods
|
2024-09-27 16:19:30 +08:00 |
|
gongjy
|
6759da45c1
|
update model mask
|
2024-09-21 20:00:25 +08:00 |
|
gongjy
|
02297df3c1
|
Efficient implementation of Inference KV cache
|
2024-09-21 00:01:05 +08:00 |
|
gongjy
|
9093519c37
|
Updated some explanations
|
2024-09-20 17:07:51 +08:00 |
|
gongjy
|
ee218402cd
|
update some explain of the code
|
2024-09-20 17:04:16 +08:00 |
|
Ben
|
2dceaf4a92
|
添加注释,方便学习者快速理解
|
2024-09-18 21:53:39 +08:00 |
|
gongjy
|
61cb61a46a
|
update minimind-v1-moe
|
2024-09-17 11:33:31 +08:00 |
|
gongjy
|
8c18b324d0
|
update model
|
2024-09-16 16:59:52 +08:00 |
|
gongjy
|
e4ad822c40
|
update model
|
2024-09-16 15:29:57 +08:00 |
|
gongjy
|
16928c1231
|
update some config
|
2024-09-15 15:12:47 +08:00 |
|
gongjy
|
aa5d70321f
|
update config
|
2024-09-15 15:09:21 +08:00 |
|
gongjy
|
f3f1cc5fac
|
update config
|
2024-09-15 15:08:04 +08:00 |
|
gongjy
|
3068e5efcc
|
update model/dataset.py
|
2024-09-14 16:09:42 +08:00 |
|
gongjy
|
ecf6d44133
|
update model/dataset.py
|
2024-09-14 14:05:41 +08:00 |
|
gongjy
|
8be42693f6
|
MiniMind first open source
|
2024-08-28 16:41:44 +08:00 |
|