TensorRT-LLMs/cpp/kernels/fmha_v2/src
Perkz Zheng 1c5b0d6a13
[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291)
* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add mtp for fmha_v2 MLA kernels and add chunked-attention support for hopper fmha kernels

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-05-19 09:57:10 -07:00
..
fmha [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
convert.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_attention_kernel_1xN_multi_cta.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_attention.cpp [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_attention.h [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_cross_attention_kernel_1xN_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel_noloop.h [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291) 2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_bf16.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_fp8.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_fp16.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_fp32.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_impl.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_int8.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00