TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Julien Debache 6bddaf6df6 chore: Improve documentation of Kv_block_array (#5765 ) Signed-off-by: Julien Debache <julien.debache@hotmail.com>		2025-07-05 22:25:27 +02:00
..
fmha	chore: Improve documentation of Kv_block_array (#5765 )	2025-07-05 22:25:27 +02:00
convert.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h	use cu for fmha_v2 (#4694 )	2025-06-15 18:40:44 +08:00
fused_multihead_attention_kernel_1xN_multi_cta.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_attention.cpp	fix: fmha_v2 compilation (#4659 )	2025-05-27 17:39:39 +08:00
fused_multihead_attention.h	use cu for fmha_v2 (#4694 )	2025-06-15 18:40:44 +08:00
fused_multihead_cross_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel_noloop.h	[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291 )	2025-05-19 09:57:10 -07:00
fused_multihead_flash_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_bf16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp16.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_fp32.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_impl.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_int8.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00