58 Commits

Author SHA1 Message Date
jingyaogong 4a68da72d5 [fix] lora dims 2026-05-31 13:38:51 +08:00
jingyaogong dddedc6881 [fix] repetition_penalty 2026-05-07 19:08:52 +08:00
jingyaogong 1718e9a44d [fix] transformers-5.x 2026-04-19 23:48:54 +08:00
jingyaogong 5704766352 [update] tie embedding 2026-04-19 21:57:28 +08:00
jingyaogong 2ab6455d9d [update] open causal 2026-04-02 15:28:58 +08:00
jingyaogong 101d7df2da [update] minimind-3 2026-03-25 23:57:45 +08:00
jingyaogong c090b69c4d [update] align loss 2026-01-15 00:56:32 +08:00
jingyaogong e119db8478 [fix] compile unpack 2026-01-14 20:13:32 +08:00
jingyaogong 1279a61681 [update] prompt prefill 2026-01-13 17:46:54 +08:00
jingyaogong 9d898576ac [update] aux loss 2026-01-01 22:41:46 +08:00
jingyaogong c65335b56f [fix] experts unused 2025-12-31 21:47:04 +08:00
jingyaogong a9c56b20e9 [fix] lora weight 2025-12-22 21:27:29 +08:00
whitesword 3a18fdd666 Fix: support loading DDP-saved LoRA weights for inference 2025-12-22 20:50:25 +08:00
jingyaogong 5129f0e2a2 [fix] dtype & lr 2025-12-09 13:01:38 +08:00
jingyaogong ecd1ae1563 [fix] reduce aux_loss_alpha 2025-12-05 23:08:29 +08:00
jingyaogong 151fdf7e76 [feat] update yarn 2025-12-01 16:15:05 +08:00
jingyaogong 6b86ea399a [feat] release memory 2025-11-27 19:39:49 +08:00
jingyaogong f5374dc87f [fix] model attn_mask 2025-11-19 22:26:53 +08:00
jingyaogong a044578d73 [fix] update model 2025-11-18 13:07:20 +08:00
yuyu5333 7d02ce673c fix: attn_forwad when is_causal=True assert attn_mask is None 2025-11-18 03:17:17 +00:00
jingyaogong 4014e62cdf [fix] restore 2025-10-23 19:00:06 +08:00
jingyaogong 557bcc018d [fix] issue-431 2025-10-23 18:56:30 +08:00
jingyaogong 4e35fb9da8 [fix] update model 2025-10-17 00:09:32 +08:00
jingyaogong 5ffde04b7c update lora 2025-04-27 15:45:06 +08:00
jingyaogong 29454c31af fix bugs 2025-04-27 09:56:49 +08:00
jingyaogong 274483cb1b 250426 2025-04-26 10:07:55 +08:00
jingyaogong a62faf34bd 250426 2025-04-26 10:05:47 +08:00
jingyaogong d9453ed9a3 update moe note 2025-04-09 17:38:31 +08:00
jingyaogong 4a7c1c49e8 update rlaif 2025-04-05 16:06:08 +08:00
jingyaogong 9e67798397 update generate 2025-04-05 15:53:55 +08:00
jingyaogong 399d526fbd add hidden state 2025-04-05 14:39:56 +08:00
jingyaogong ed01c5d84a update inference 2025-04-05 12:03:04 +08:00
jingyaogong bf81fd5f5e rmsnorm float convert 2025-04-01 16:03:44 +08:00
jingyaogong e369b33265 fix chat mask bug 2025-04-01 13:44:55 +08:00
jingyaogong 258507ff89 delete __pycache__ 2025-04-01 11:51:54 +08:00
gongjy 844e79148c update generate args 2025-02-15 23:56:09 +08:00
gongjy 19b388cd87 update generate args 2025-02-15 23:55:10 +08:00
gongjy 5b65bc767e update cis init 2025-02-15 20:26:34 +08:00
gongjy 58e3af0359 add minimind2 2025-02-09 23:49:47 +08:00
gongjy 3ff66f7221 update model 2024-10-20 15:13:58 +08:00
gongjy 772834148e update readme 2024-10-08 23:40:29 +08:00
gongjy a87f628400 update model (fix loss bug) 2024-09-29 16:58:48 +08:00
gongjy 75753ea765 Update data preprocessing methods 2024-09-27 17:19:03 +08:00
gongjy a8ae342775 Update data preprocessing methods 2024-09-27 16:19:30 +08:00
gongjy 6759da45c1 update model mask 2024-09-21 20:00:25 +08:00
gongjy 02297df3c1 Efficient implementation of Inference KV cache 2024-09-21 00:01:05 +08:00
gongjy 9093519c37 Updated some explanations 2024-09-20 17:07:51 +08:00
gongjy ee218402cd update some explain of the code 2024-09-20 17:04:16 +08:00
Ben 2dceaf4a92 添加注释,方便学习者快速理解 2024-09-18 21:53:39 +08:00
gongjy 61cb61a46a update minimind-v1-moe 2024-09-17 11:33:31 +08:00