TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Faraz 27a5091fcb [None][feat] GPT-OSS Sm120/Sm121 Support (#7937 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Vincent Huang <vincenth@nvidia.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Vincent Huang <vincenth@nvidia.com>		2025-10-06 16:59:06 -04:00
..
fmha	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 )	2025-10-06 16:59:06 -04:00
convert.cu	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_demo_bert_params.h	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
fused_multihead_attention_kernel_1xN_multi_cta.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_2x2.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4x1_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel_4xN_hopper.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_attention_utils.h	[None][feat] Add support for Hopper MLA chunked prefill (#6655 )	2025-08-14 10:39:26 +08:00
fused_multihead_attention.cpp	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 )	2025-09-15 21:43:49 +08:00
fused_multihead_attention.h	[None][feat] Add support for Hopper MLA chunked prefill (#6655 )	2025-08-14 10:39:26 +08:00
fused_multihead_cross_attention_kernel_1xN_noloop.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention_kernel_1xN.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_cross_attention.cpp	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
fused_multihead_cross_attention.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
fused_multihead_flash_attention_kernel_noloop_tiled.h	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 )	2025-10-06 16:59:06 -04:00
fused_multihead_flash_attention_kernel_noloop.h	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 )	2025-10-06 16:59:06 -04:00
fused_multihead_flash_attention_kernel.h	infra: open source fmha v2 kernels (#4185 )	2025-05-15 10:56:34 +08:00
softmax_bf16.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
softmax_fp8.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
softmax_fp16.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
softmax_fp32.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
softmax_impl.h	[None][feat] Add support for Hopper MLA chunked prefill (#6655 )	2025-08-14 10:39:26 +08:00
softmax_int8.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00