TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

qsang-nv 180b91f957 update fmha_v2 (#4895 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>		2025-06-05 22:14:28 +08:00
..
fmha	update fmha_v2 (#4895 )	2025-06-05 22:14:28 +08:00
convert.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h	update fmha_v2 (#4895 )	2025-06-05 22:14:28 +08:00
fused_multihead_attention_kernel_1xN_multi_cta.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.cpp	fix: fmha_v2 compilation (#4659 )	2025-05-27 17:39:39 +08:00
fused_multihead_attention.h	update fmha_v2 (#4895 )	2025-06-05 22:14:28 +08:00
fused_multihead_cross_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel_noloop.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_bf16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp32.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_impl.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_int8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00