TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-09 20:43:50 +08:00

History

Perkz Zheng 1c5b0d6a13 [Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 ) * update cubins Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * add mtp for fmha_v2 MLA kernels and add chunked-attention support for hopper fmha kernels Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>		2025-05-19 09:57:10 -07:00
..
fmha	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
convert.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention_kernel_1xN_multi_cta.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.cpp	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_cross_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel_noloop.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_bf16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp32.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_impl.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_int8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00