TensorRT-LLMs/cpp/tensorrt_llm/kernels
benzh-2025 6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
..
beamSearchKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
causalConv1d [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
communicationKernels [https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499) 2026-01-13 17:16:22 +08:00
contextFusedMultiHeadAttention [TRTLLM-9805][feat] Skip Softmax Attention. (#9821) 2025-12-21 02:52:42 -05:00
cuteDslKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cutlass_kernels [None][feat] add fp4 gemm + allreduce (#9729) 2026-01-13 21:11:13 +08:00
decoderMaskedMultiheadAttention [TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264) 2026-01-11 19:26:10 -05:00
dsv3MinLatencyKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
flashMLA feat: reduce unnecessary kernel generation (#5476) 2025-07-04 14:37:49 +08:00
fusedLayernormKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
groupRmsNormKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
internal_cutlass_kernels [None][chore] Update internal_cutlass_kernels artifacts (#9992) 2025-12-15 21:15:25 -08:00
llama4MinLatencyKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
lora [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeLoadBalance [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
selectiveScan [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00
speculativeDecoding [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
tinygemm2 [None][chore] Update tinygemm kernel name (#10248) 2025-12-24 02:33:25 -05:00
trtllmGenKernels [https://nvbugs/5503479][fix] update trtllm-gen kernels to address few bugs (#10089) 2025-12-22 04:45:33 -05:00
unfusedAttentionKernels [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
userbuffers [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyBatchedGemv [None][feat] sm100 weight-only kernel (#10190) 2026-01-05 09:44:36 +08:00
attentionMask.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionMask.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
CMakeLists.txt [None][fix] Fix regex pattern for cubin filtering (#9914) 2025-12-15 10:02:48 +08:00
cudaAsyncOps.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
cumsumLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cumsumLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttentionUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingCommon.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaDispatcher.cpp [TRTLLM-9805][feat] Skip Softmax Attention. (#9821) 2025-12-21 02:52:42 -05:00
fmhaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fusedMoeCommKernels.cu [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedMoeCommKernels.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedQKNormRopeKernel.cu [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
fusedQKNormRopeKernel.h [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
gptKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
gptKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
helixAllToAll.cu [TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986) 2025-12-23 18:14:30 -08:00
helixAllToAll.h [TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986) 2025-12-23 18:14:30 -08:00
helixKernels.cu [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
helixKernels.h [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
indexerKCacheScatter.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerKCacheScatter.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
indexerTopK.cu [https://nvbugs/5720357][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027) 2025-12-19 14:58:01 -05:00
IndexerTopK.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCachePartialCopy.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCacheUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
ll128Proto.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
logitsBitmask.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
logitsBitmask.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lookupKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lookupKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaKernels.cu [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
mlaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moe_utils.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeCommKernelsCommon.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
moePrepareKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moePrepareKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeTopKFuncs.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
multiHeadAttentionCommon.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
noAuxTcKernels.cu [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
noAuxTcKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyTypes.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerChannel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerGroup.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
quantization.cu [TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues (#10285) 2026-01-03 14:34:55 +08:00
quantization.cuh [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
quantization.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingAirTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
xqaDispatcher.cpp [TRTLLM-9805][feat] Skip Softmax Attention. (#9821) 2025-12-21 02:52:42 -05:00
xqaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00