TensorRT-LLMs/cpp/tensorrt_llm/kernels
CarstyYou 0b279f4ad4
[https://nvbugs/5456493][feat] Add fp8 bmm on sm120 (#9687)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-12-18 22:57:20 +08:00
..
beamSearchKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
causalConv1d [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
communicationKernels [https://nvbugs/5655885][fix] fix invalid instruction error in 2shot ar kernel on Ampere (#9394) 2025-12-15 14:22:56 +08:00
contextFusedMultiHeadAttention [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cuteDslKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cutlass_kernels [https://nvbugs/5456493][feat] Add fp8 bmm on sm120 (#9687) 2025-12-18 22:57:20 +08:00
decoderMaskedMultiheadAttention [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00
dsv3MinLatencyKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
flashMLA feat: reduce unnecessary kernel generation (#5476) 2025-07-04 14:37:49 +08:00
fusedLayernormKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupRmsNormKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
internal_cutlass_kernels [None][chore] Update internal_cutlass_kernels artifacts (#9992) 2025-12-15 21:15:25 -08:00
llama4MinLatencyKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lora [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeLoadBalance [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
selectiveScan [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00
speculativeDecoding [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
tinygemm2 [TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss (#7916) 2025-10-02 10:47:04 -07:00
trtllmGenKernels [None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 (#9734) 2025-12-18 11:58:23 +01:00
unfusedAttentionKernels [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffers [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyBatchedGemv [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionMask.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionMask.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banBadWords.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
banRepeatNgram.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
beamSearchKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
buildRelativeAttentionBiasKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
CMakeLists.txt [None][fix] Fix regex pattern for cubin filtering (#9914) 2025-12-15 10:02:48 +08:00
cudaAsyncOps.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
cumsumLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cumsumLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customAllReduceKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttention.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decoderMaskedMultiheadAttentionUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingCommon.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
decodingKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
delayStream.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
doraScaling.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaDispatcher.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fusedMoeCommKernels.cu [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedMoeCommKernels.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
fusedQKNormRopeKernel.cu [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
fusedQKNormRopeKernel.h [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
gptKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
gptKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
helixKernels.cu [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
helixKernels.h [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
indexerKCacheScatter.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerKCacheScatter.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
indexerTopK.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerTopK.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCachePartialCopy.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
kvCacheUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
layernormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
ll128Proto.cuh [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
logitsBitmask.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
logitsBitmask.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lookupKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lookupKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
lruKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaChunkedPrefill.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moe_utils.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeCommKernelsCommon.h [TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922) 2025-12-14 11:29:30 -08:00
moePrepareKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moePrepareKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeTopKFuncs.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
multiHeadAttentionCommon.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
noAuxTcKernels.cu [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
noAuxTcKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
penaltyTypes.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
preQuantScaleKernel.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerChannel.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
qserveGemmPerGroup.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
quantization.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
quantization.cuh [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
quantization.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
recoverFromRingAtten.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
rmsnormKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sageAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingAirTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopKKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
samplingTopPKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
sparseAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
splitkGroupGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
stopCriteriaKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
topkLastDim.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
unfusedAttentionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
xqaDispatcher.cpp [https://nvbugs/5727952][fix] a pdl bug in trtllm-gen fmha kernels (#9913) 2025-12-16 00:34:37 -08:00
xqaDispatcher.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00