TensorRT-LLMs

zhhuang-nv 97bc680cd8 feat: support kv cache reuse for MLA (#3571 ) * support kv cache reuse for MLA load compressed_kv and k_pe and do up-projection use 192/128 head size MLA context kernel support Blackwell and Hopper now Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * add CI test Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2 Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * use GPTJ style RoPE for MLA Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix rebase error and some docs Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix kv_lens Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * tiny fix Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: use normal device memory instead of pinned memory for unit test Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * fix L0 tests Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile after rebase Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments again Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> --------- Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com> Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-05-15 15:22:21 +08:00
..
fmha_cubin.h	feat: support kv cache reuse for MLA (#3571 )	2025-05-15 15:22:21 +08:00
fmha_v2_bf16_64_32_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_64_64_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_128_32_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_128_64_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_256_32_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_256_64_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_384_32_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_384_64_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_512_32_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_bf16_512_64_ldgsts_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm100.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_160_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_192_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_64_S_qkv_256_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_softmax_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_softmax_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm100.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_tma_ws_sm90.cubin.cpp	feat: support kv cache reuse for MLA (#3571 )	2025-05-15 15:22:21 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_72_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_80_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_96_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_104_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm100.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_tma_ws_sm90.cubin.cpp	feat: support kv cache reuse for MLA (#3571 )	2025-05-15 15:22:21 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_kv_32_softmax_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_kv_64_softmax_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm80.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm86.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm80.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm86.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm89.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm90.cubin.cpp	Update (#2978 )	2025-03-23 16:39:35 +08:00
fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_160_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_192_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_kv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_sage_64_64_256_output_bf16_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_96_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_104_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_sage_64_64_256_output_bf16_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_tma_ws_sm90.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_72_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_72_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_80_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_96_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_104_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_160_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_output_bf16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_256_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_72_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_72_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sage_64_32_32_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sage_64_32_32_output_fp16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_96_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_96_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_104_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_104_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sage_64_32_32_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sage_64_32_32_output_fp16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_160_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_160_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_output_bf16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_256_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_256_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_output_bf16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_output_bf16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_output_bf16_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_output_bf16_sm120.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_sm89.cubin.cpp	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00

fmha_cubin.h

feat: support kv cache reuse for MLA (#3571 )

2025-05-15 15:22:21 +08:00

fmha_v2_bf16_64_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_64_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_128_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_128_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_256_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_256_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_384_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_384_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_512_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_bf16_512_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_16_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_32_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm100.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_q_paged_kv_576x512_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_64_S_qkv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm100.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_192x128_tma_ws_sm90.cubin.cpp

feat: support kv cache reuse for MLA (#3571 )

2025-05-15 15:22:21 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm100.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_192x128_tma_ws_sm90.cubin.cpp

feat: support kv cache reuse for MLA (#3571 )

2025-05-15 15:22:21 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_128_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_kv_32_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_kv_64_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_bf16_128_128_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_128_S_qkv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_sage_64_64_256_output_bf16_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_sage_64_64_256_output_bf16_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_64_256_S_qkv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_output_bf16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sage_64_32_32_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sage_64_32_32_output_fp16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sage_64_32_32_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sage_64_32_32_output_fp16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_output_bf16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_32_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_output_bf16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_192x128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_output_bf16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_q_paged_kv_576x512_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_output_bf16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_output_bf16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_64_64_S_qkv_192x128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_e4m3_fp32_128_128_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_192_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_16_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_32_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_64_S_qkv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_192_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_64_128_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_kv_32_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_kv_64_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_128_128_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_192_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_16_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_softcapping_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_32_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_q_paged_kv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_160_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_160_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_192_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_192_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_256_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_256_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_64_S_qkv_256_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_160_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_160_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_160_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_192_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_192_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_192_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_softcapping_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_softcapping_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_softcapping_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_q_paged_kv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_72_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_80_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_96_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_104_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_softcapping_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_128_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_160_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_160_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_160_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_160_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_160_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_192_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_192_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_192_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_192_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_192_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_softcapping_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_softcapping_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_softcapping_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_softcapping_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_128_S_qkv_256_softcapping_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_kv_32_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_kv_64_softmax_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_q_paged_kv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_32_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_32_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_40_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_40_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_48_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_48_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_64_alibi_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_64_256_S_qkv_64_tma_ws_sm90.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_16_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_16_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_16_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_32_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_32_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_32_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_40_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_40_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_40_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_48_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_48_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_48_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_64_sm80.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_64_sm86.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_64_sm89.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_q_paged_kv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_16_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_16_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_16_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_16_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_16_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_32_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_32_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_32_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_32_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_32_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_40_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_40_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_40_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_40_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_40_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_48_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_48_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_48_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_48_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_48_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_64_sm80.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_64_sm86.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_64_sm89.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_64_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_flash_attention_fp16_fp32_128_128_S_qkv_64_sm120.cubin.cpp

fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )

2025-04-29 09:09:43 +08:00

fmha_v2_fp16_64_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_64_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_128_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_128_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_256_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_256_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_384_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_384_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_512_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_512_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_64_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_64_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_128_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_128_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_256_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_256_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_384_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_384_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_512_32_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00

fmha_v2_fp16_fp32_512_64_ldgsts_sm90.cubin.cpp

Update (#2978 )

2025-03-23 16:39:35 +08:00