TensorRT-LLMs/cpp/kernels/fmha_v2/src
jmydurant 7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
..
fmha [TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477) 2025-09-15 21:43:49 +08:00
convert.cu infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
fused_multihead_attention_kernel_1xN_multi_cta.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h [None][feat] Add support for Hopper MLA chunked prefill (#6655) 2025-08-14 10:39:26 +08:00
fused_multihead_attention.cpp [TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477) 2025-09-15 21:43:49 +08:00
fused_multihead_attention.h [None][feat] Add support for Hopper MLA chunked prefill (#6655) 2025-08-14 10:39:26 +08:00
fused_multihead_cross_attention_kernel_1xN_noloop.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
fused_multihead_cross_attention.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h [TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379) 2025-08-05 07:47:41 +00:00
fused_multihead_flash_attention_kernel_noloop.h [TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379) 2025-08-05 07:47:41 +00:00
fused_multihead_flash_attention_kernel.h infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
softmax_bf16.cu [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
softmax_fp8.cu [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
softmax_fp16.cu [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
softmax_fp32.cu [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00
softmax_impl.h [None][feat] Add support for Hopper MLA chunked prefill (#6655) 2025-08-14 10:39:26 +08:00
softmax_int8.cu [None] [feat] Add model gpt-oss (#6645) 2025-08-07 03:04:18 -04:00