TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 04:03:22 +08:00

History

Perkz Zheng 6a35c599ef Clean: fmha codes (#4496 ) clean codes Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>		2025-05-21 11:45:47 +08:00
..
fmha	Clean: fmha codes (#4496 )	2025-05-21 11:45:47 +08:00
convert.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention_kernel_1xN_multi_cta.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.cpp	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_cross_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel_noloop.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_bf16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp32.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_impl.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_int8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00