| .. |
|
beamSearchKernels
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
communicationKernels
|
Adding two-shot allreduce kernel and mnnvl multicasting buffer (#4216)
|
2025-05-22 03:42:36 +08:00 |
|
contextFusedMultiHeadAttention
|
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
|
2025-05-26 09:06:33 +08:00 |
|
cutlass_kernels
|
[feat] support fp8 blockscale gemm on sm89 (#4481)
|
2025-05-23 10:39:10 +08:00 |
|
decoderMaskedMultiheadAttention
|
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
|
2025-05-26 09:06:33 +08:00 |
|
flashMLA
|
|
|
|
fusedLayernormKernels
|
|
|
|
groupRmsNormKernels
|
feat: Add heuristic for GroupRMSNorm kernel selection. (#4047)
|
2025-05-13 08:52:53 +08:00 |
|
internal_cutlass_kernels
|
perf: Fuse gemm setup function for SM90/SM100 MOE plugin path (#4146)
|
2025-05-21 10:00:36 +08:00 |
|
lora
|
|
|
|
moeLoadBalance
|
feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384)
|
2025-05-20 17:53:48 +08:00 |
|
selectiveScan
|
|
|
|
speculativeDecoding
|
fix: Eagle decoding in TRT flow (#4229)
|
2025-05-14 16:10:49 +02:00 |
|
trtllmGenKernels
|
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
|
2025-05-26 09:06:33 +08:00 |
|
unfusedAttentionKernels
|
feat: add CGA reduction fmha kernels on Blackwell. (#3763)
|
2025-04-29 10:43:54 +08:00 |
|
userbuffers
|
|
|
|
weightOnlyBatchedGemv
|
|
|
|
attentionMask.cu
|
|
|
|
attentionMask.h
|
|
|
|
banBadWords.cu
|
|
|
|
banBadWords.h
|
|
|
|
banRepeatNgram.cu
|
|
|
|
banRepeatNgram.h
|
|
|
|
beamSearchKernels.cu
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
beamSearchKernels.h
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
buildRelativeAttentionBiasKernel.cu
|
|
|
|
buildRelativeAttentionBiasKernel.h
|
|
|
|
CMakeLists.txt
|
infra: open source fmha v2 kernels (#4185)
|
2025-05-15 10:56:34 +08:00 |
|
cumsumLastDim.cu
|
|
|
|
cumsumLastDim.h
|
|
|
|
customAllReduceKernels.cu
|
chore: bump version to 0.19.0 (#3598) (#3841)
|
2025-04-29 16:57:22 +08:00 |
|
customAllReduceKernels.h
|
feat: Low Precision Allreduce for PCIe based GPU (#4344)
|
2025-05-20 06:53:46 +08:00 |
|
decoderMaskedMultiheadAttention.cu
|
|
|
|
decoderMaskedMultiheadAttention.h
|
fix: [https://nvbugspro.nvidia.com/bug/5238626] illegal memory address when running llama 4 with cuda graph enabled (#4101)
|
2025-05-13 14:58:54 +08:00 |
|
decoderMaskedMultiheadAttentionUtils.h
|
|
|
|
decodingCommon.cu
|
|
|
|
decodingKernels.cu
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
decodingKernels.h
|
|
|
|
delayStream.cu
|
|
|
|
delayStream.h
|
|
|
|
doraScaling.cu
|
|
|
|
doraScaling.h
|
|
|
|
fmhaDispatcher.cpp
|
Feat: add chunked-attention kernels on Blackwell (#4394)
|
2025-05-21 10:16:46 +08:00 |
|
fmhaDispatcher.h
|
|
|
|
fusedQKNormRopeKernel.cu
|
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
|
2025-05-23 15:31:04 +08:00 |
|
fusedQKNormRopeKernel.h
|
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
|
2025-05-23 15:31:04 +08:00 |
|
gptKernels.cu
|
|
|
|
gptKernels.h
|
feat: add CGA reduction fmha kernels on Blackwell. (#3763)
|
2025-04-29 10:43:54 +08:00 |
|
groupGemm.cu
|
|
|
|
groupGemm.h
|
|
|
|
kvCachePartialCopy.cu
|
|
|
|
kvCacheUtils.h
|
|
|
|
layernormKernels.cu
|
|
|
|
layernormKernels.h
|
|
|
|
logitsBitmask.cu
|
|
|
|
logitsBitmask.h
|
|
|
|
lookupKernels.cu
|
|
|
|
lookupKernels.h
|
|
|
|
lruKernel.cu
|
|
|
|
lruKernel.h
|
|
|
|
mambaConv1dKernels.cu
|
|
|
|
mambaConv1dKernels.h
|
|
|
|
mlaKernels.cu
|
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535)
|
2025-05-23 19:47:50 +08:00 |
|
mlaKernels.h
|
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535)
|
2025-05-23 19:47:50 +08:00 |
|
moeCommKernels.cu
|
feat: Add MNNVL MoE A2A support (#3504)
|
2025-04-25 17:29:08 +08:00 |
|
moeCommKernels.h
|
feat: Add MNNVL MoE A2A support (#3504)
|
2025-04-25 17:29:08 +08:00 |
|
multiHeadAttentionCommon.h
|
|
|
|
noAuxTcKernels.cu
|
|
|
|
noAuxTcKernels.h
|
|
|
|
penaltyKernels.cu
|
|
|
|
penaltyKernels.h
|
|
|
|
penaltyTypes.h
|
|
|
|
preQuantScaleKernel.cu
|
[TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123)
|
2025-05-14 15:48:07 +08:00 |
|
preQuantScaleKernel.h
|
[TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123)
|
2025-05-14 15:48:07 +08:00 |
|
qserveGemm.h
|
|
|
|
qserveGemmPerChannel.cu
|
|
|
|
qserveGemmPerGroup.cu
|
|
|
|
quantization.cu
|
|
|
|
quantization.cuh
|
|
|
|
quantization.h
|
|
|
|
recoverFromRingAtten.cu
|
Support RingAttention in the BertAttention plugin and the DiT model (#3661)
|
2025-05-09 08:06:54 +08:00 |
|
recoverFromRingAtten.h
|
Support RingAttention in the BertAttention plugin and the DiT model (#3661)
|
2025-05-09 08:06:54 +08:00 |
|
rmsnormKernels.cu
|
|
|
|
rmsnormKernels.h
|
|
|
|
sageAttentionKernels.cu
|
|
|
|
sageAttentionKernels.h
|
|
|
|
samplingAirTopPKernels.cu
|
|
|
|
samplingTopKKernels.cu
|
|
|
|
samplingTopKKernels.h
|
|
|
|
samplingTopPKernels.cu
|
|
|
|
samplingTopPKernels.h
|
|
|
|
splitkGroupGemm.cu
|
|
|
|
splitkGroupGemm.h
|
|
|
|
stopCriteriaKernels.cu
|
|
|
|
stopCriteriaKernels.h
|
|
|
|
topkLastDim.cu
|
|
|
|
topkLastDim.h
|
|
|
|
unfusedAttentionKernels.cu
|
|
|
|
unfusedAttentionKernels.h
|
|
|
|
xqaDispatcher.cpp
|
Feat: add sliding-window-attention generation-phase kernels on Blackwell (#4564)
|
2025-05-26 09:06:33 +08:00 |
|
xqaDispatcher.h
|
|
|