TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-08 12:12:33 +08:00

History

Pengbo Wang c0e25e5418 [TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>		2026-01-11 19:26:10 -05:00
..
beamSearchKernels
causalConv1d
communicationKernels	[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… (#10229 )	2025-12-27 22:48:10 +08:00
contextFusedMultiHeadAttention	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 )	2025-12-21 02:52:42 -05:00
cuteDslKernels
cutlass_kernels	[https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE (#9838 )	2026-01-06 10:16:41 +08:00
decoderMaskedMultiheadAttention	[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264 )	2026-01-11 19:26:10 -05:00
dsv3MinLatencyKernels
flashMLA
fusedLayernormKernels	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
groupRmsNormKernels	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
internal_cutlass_kernels	[None][chore] Update internal_cutlass_kernels artifacts (#9992 )	2025-12-15 21:15:25 -08:00
llama4MinLatencyKernels	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
lora
moeLoadBalance
selectiveScan
speculativeDecoding
tinygemm2	[None][chore] Update tinygemm kernel name (#10248 )	2025-12-24 02:33:25 -05:00
trtllmGenKernels	[https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs (#10089 )	2025-12-22 04:45:33 -05:00
unfusedAttentionKernels	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
userbuffers
weightOnlyBatchedGemv	[None][feat] sm100 weight-only kernel (#10190 )	2026-01-05 09:44:36 +08:00
attentionMask.cu
attentionMask.h
banBadWords.cu
banBadWords.h
banRepeatNgram.cu
banRepeatNgram.h
beamSearchKernels.cu
beamSearchKernels.h
buildRelativeAttentionBiasKernel.cu
buildRelativeAttentionBiasKernel.h
CMakeLists.txt
cudaAsyncOps.cuh
cumsumLastDim.cu
cumsumLastDim.h
customAllReduceKernels.cu
customAllReduceKernels.h
customMoeRoutingKernels.cu
customMoeRoutingKernels.h
decoderMaskedMultiheadAttention.cu
decoderMaskedMultiheadAttention.h
decoderMaskedMultiheadAttentionUtils.h
decodingCommon.cu
decodingKernels.cu
decodingKernels.h
delayStream.cu
delayStream.h
doraScaling.cu
doraScaling.h
fmhaDispatcher.cpp	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 )	2025-12-21 02:52:42 -05:00
fmhaDispatcher.h
fusedMoeCommKernels.cu
fusedMoeCommKernels.h
fusedQKNormRopeKernel.cu
fusedQKNormRopeKernel.h
gptKernels.cu
gptKernels.h
groupGemm.cu
groupGemm.h
helixAllToAll.cu	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 )	2025-12-23 18:14:30 -08:00
helixAllToAll.h	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 )	2025-12-23 18:14:30 -08:00
helixKernels.cu
helixKernels.h
indexerKCacheScatter.cu
IndexerKCacheScatter.h
indexerTopK.cu	[https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027 )	2025-12-19 14:58:01 -05:00
IndexerTopK.h
kvCachePartialCopy.cu
kvCacheUtils.h
layernormKernels.cu
layernormKernels.h
ll128Proto.cuh
logitsBitmask.cu
logitsBitmask.h
lookupKernels.cu
lookupKernels.h
lruKernel.cu
lruKernel.h
mambaConv1dKernels.cu
mambaConv1dKernels.h
mlaChunkedPrefill.cu
mlaChunkedPrefill.cuh
mlaKernels.cu	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
mlaKernels.h
moe_utils.cuh
moeAlignKernels.cu
moeAlignKernels.h
moeCommKernelsCommon.h
moePrepareKernels.cu
moePrepareKernels.h
moeTopKFuncs.cuh
multiHeadAttentionCommon.h
noAuxTcKernels.cu	[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792 )	2025-12-15 19:59:08 -08:00
noAuxTcKernels.h
penaltyKernels.cu
penaltyKernels.h
penaltyTypes.h
preQuantScaleKernel.cu
preQuantScaleKernel.h
qserveGemm.h
qserveGemmPerChannel.cu
qserveGemmPerGroup.cu
quantization.cu	[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues (#10285 )	2026-01-03 14:34:55 +08:00
quantization.cuh	[TRTLLM-9578][feat] make PDL enabled by default (#9695 )	2025-12-25 07:15:24 -05:00
quantization.h
recoverFromRingAtten.cu
recoverFromRingAtten.h
rmsnormKernels.cu
rmsnormKernels.h
sageAttentionKernels.cu
sageAttentionKernels.h
samplingAirTopPKernels.cu
samplingTopKKernels.cu
samplingTopKKernels.h
samplingTopPKernels.cu
samplingTopPKernels.h
sparseAttentionKernels.cu
sparseAttentionKernels.h
splitkGroupGemm.cu
splitkGroupGemm.h
stopCriteriaKernels.cu
stopCriteriaKernels.h
topkLastDim.cu
topkLastDim.h
unfusedAttentionKernels.cu
unfusedAttentionKernels.h
xqaDispatcher.cpp	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 )	2025-12-21 02:52:42 -05:00
xqaDispatcher.h