TensorRT-LLMs/cpp/tensorrt_llm/kernels
Perkz Zheng 4d711be8f4
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
* move cubins to LFS

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add sliding-window-attention generation-phase kernels on Blackwell

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* address comments

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-26 09:06:33 +08:00
..
beamSearchKernels Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979) 2025-05-12 22:32:29 +02:00
communicationKernels Adding two-shot allreduce kernel and mnnvl multicasting buffer (#4216) 2025-05-22 03:42:36 +08:00
contextFusedMultiHeadAttention Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564) 2025-05-26 09:06:33 +08:00
cutlass_kernels [feat] support fp8 blockscale gemm on sm89 (#4481) 2025-05-23 10:39:10 +08:00
decoderMaskedMultiheadAttention Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564) 2025-05-26 09:06:33 +08:00
flashMLA
fusedLayernormKernels
groupRmsNormKernels feat: Add heuristic for GroupRMSNorm kernel selection. (#4047) 2025-05-13 08:52:53 +08:00
internal_cutlass_kernels perf: Fuse gemm setup function for SM90/SM100 MOE plugin path (#4146) 2025-05-21 10:00:36 +08:00
lora
moeLoadBalance feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384) 2025-05-20 17:53:48 +08:00
selectiveScan
speculativeDecoding fix: Eagle decoding in TRT flow (#4229) 2025-05-14 16:10:49 +02:00
trtllmGenKernels Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564) 2025-05-26 09:06:33 +08:00
unfusedAttentionKernels feat: add CGA reduction fmha kernels on Blackwell. (#3763) 2025-04-29 10:43:54 +08:00
userbuffers
weightOnlyBatchedGemv
attentionMask.cu
attentionMask.h
banBadWords.cu
banBadWords.h
banRepeatNgram.cu
banRepeatNgram.h
beamSearchKernels.cu Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979) 2025-05-12 22:32:29 +02:00
beamSearchKernels.h Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979) 2025-05-12 22:32:29 +02:00
buildRelativeAttentionBiasKernel.cu
buildRelativeAttentionBiasKernel.h
CMakeLists.txt infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
cumsumLastDim.cu
cumsumLastDim.h
customAllReduceKernels.cu chore: bump version to 0.19.0 (#3598) (#3841) 2025-04-29 16:57:22 +08:00
customAllReduceKernels.h feat: Low Precision Allreduce for PCIe based GPU (#4344) 2025-05-20 06:53:46 +08:00
decoderMaskedMultiheadAttention.cu
decoderMaskedMultiheadAttention.h fix: [https://nvbugspro.nvidia.com/bug/5238626] illegal memory address when running llama 4 with cuda graph enabled (#4101) 2025-05-13 14:58:54 +08:00
decoderMaskedMultiheadAttentionUtils.h
decodingCommon.cu
decodingKernels.cu Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979) 2025-05-12 22:32:29 +02:00
decodingKernels.h
delayStream.cu
delayStream.h
doraScaling.cu
doraScaling.h
fmhaDispatcher.cpp Feat: add chunked-attention kernels on Blackwell (#4394) 2025-05-21 10:16:46 +08:00
fmhaDispatcher.h
fusedQKNormRopeKernel.cu perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482) 2025-05-23 15:31:04 +08:00
fusedQKNormRopeKernel.h perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482) 2025-05-23 15:31:04 +08:00
gptKernels.cu
gptKernels.h feat: add CGA reduction fmha kernels on Blackwell. (#3763) 2025-04-29 10:43:54 +08:00
groupGemm.cu
groupGemm.h
kvCachePartialCopy.cu
kvCacheUtils.h
layernormKernels.cu
layernormKernels.h
logitsBitmask.cu
logitsBitmask.h
lookupKernels.cu
lookupKernels.h
lruKernel.cu
lruKernel.h
mambaConv1dKernels.cu
mambaConv1dKernels.h
mlaKernels.cu [TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535) 2025-05-23 19:47:50 +08:00
mlaKernels.h [TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535) 2025-05-23 19:47:50 +08:00
moeCommKernels.cu feat: Add MNNVL MoE A2A support (#3504) 2025-04-25 17:29:08 +08:00
moeCommKernels.h feat: Add MNNVL MoE A2A support (#3504) 2025-04-25 17:29:08 +08:00
multiHeadAttentionCommon.h
noAuxTcKernels.cu
noAuxTcKernels.h
penaltyKernels.cu
penaltyKernels.h
penaltyTypes.h
preQuantScaleKernel.cu [TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123) 2025-05-14 15:48:07 +08:00
preQuantScaleKernel.h [TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123) 2025-05-14 15:48:07 +08:00
qserveGemm.h
qserveGemmPerChannel.cu
qserveGemmPerGroup.cu
quantization.cu
quantization.cuh
quantization.h
recoverFromRingAtten.cu Support RingAttention in the BertAttention plugin and the DiT model (#3661) 2025-05-09 08:06:54 +08:00
recoverFromRingAtten.h Support RingAttention in the BertAttention plugin and the DiT model (#3661) 2025-05-09 08:06:54 +08:00
rmsnormKernels.cu
rmsnormKernels.h
sageAttentionKernels.cu
sageAttentionKernels.h
samplingAirTopPKernels.cu
samplingTopKKernels.cu
samplingTopKKernels.h
samplingTopPKernels.cu
samplingTopPKernels.h
splitkGroupGemm.cu
splitkGroupGemm.h
stopCriteriaKernels.cu
stopCriteriaKernels.h
topkLastDim.cu
topkLastDim.h
unfusedAttentionKernels.cu
unfusedAttentionKernels.h
xqaDispatcher.cpp Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564) 2025-05-26 09:06:33 +08:00
xqaDispatcher.h