TensorRT-LLMs/cpp/tensorrt_llm/kernels
Tian Zheng cfebfbb505
[https://nvbugs/5783509][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
..
beamSearchKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
causalConv1d [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
communicationKernels [TRTLLM-9849][infra] Update dependencies to 25.12 (#9818) 2026-01-14 21:54:04 +08:00
contextFusedMultiHeadAttention [TRTLLM-9805][feat] Skip Softmax Attention. (#9821) 2025-12-21 02:52:42 -05:00
cuteDslKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cutlass_kernels [None][feat] add fp4 gemm + allreduce (#9729) 2026-01-13 21:11:13 +08:00
decoderMaskedMultiheadAttention [None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window (#10335) 2026-01-15 15:47:00 +08:00
dsv3MinLatencyKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
flashMLA feat: reduce unnecessary kernel generation (#5476) 2025-07-04 14:37:49 +08:00
fusedLayernormKernels [None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905) 2026-01-15 07:29:15 +08:00
groupRmsNormKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
internal_cutlass_kernels [None][chore] Update internal_cutlass_kernels artifacts (#9992) 2025-12-15 21:15:25 -08:00
llama4MinLatencyKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
lora [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeLoadBalance [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
selectiveScan [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00
speculativeDecoding [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
tinygemm2 [https://nvbugs/5772396][fix] WAR: Disable TinyGEMM PDL due to accuracy issues (#10619) 2026-01-13 12:40:11 -05:00
trtllmGenKernels [https://nvbugs/5783509][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490) 2026-01-16 18:59:54 +08:00
unfusedAttentionKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
userbuffers [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyBatchedGemv [None][feat] sm100 weight-only kernel (#10190) 2026-01-05 09:44:36 +08:00
attentionMask.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionMask.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
CMakeLists.txt [None][fix] Fix regex pattern for cubin filtering (#9914) 2025-12-15 10:02:48 +08:00
cudaAsyncOps.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
cumsumLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cumsumLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttentionUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingCommon.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaDispatcher.cpp [None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261) 2026-01-15 02:24:25 -05:00
fmhaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fusedMoeCommKernels.cu [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedMoeCommKernels.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedQKNormRopeKernel.cu [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
fusedQKNormRopeKernel.h [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
gptKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
gptKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
helixAllToAll.cu [TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986) 2025-12-23 18:14:30 -08:00
helixAllToAll.h [TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986) 2025-12-23 18:14:30 -08:00
helixKernels.cu [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
helixKernels.h [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
indexerKCacheScatter.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerKCacheScatter.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
indexerTopK.cu [https://nvbugs/5720357][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027) 2025-12-19 14:58:01 -05:00
IndexerTopK.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCachePartialCopy.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCacheUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
ll128Proto.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
logitsBitmask.cu [https://nvbugs/5669671][fix] Support GuidedDecoder with sharded logits (#10698) 2026-01-16 11:04:26 +08:00
logitsBitmask.h [https://nvbugs/5669671][fix] Support GuidedDecoder with sharded logits (#10698) 2026-01-16 11:04:26 +08:00
lookupKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lookupKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaKernels.cu [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
mlaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moe_utils.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeCommKernelsCommon.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
moePrepareKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moePrepareKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeTopKFuncs.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
multiHeadAttentionCommon.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
noAuxTcKernels.cu [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
noAuxTcKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyTypes.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerChannel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerGroup.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
quantization.cu [https://nvbugs/5772396][fix] WAR: Disable TinyGEMM PDL due to accuracy issues (#10619) 2026-01-13 12:40:11 -05:00
quantization.cuh [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
quantization.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingAirTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
xqaDispatcher.cpp [None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261) 2026-01-15 02:24:25 -05:00
xqaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00