| .. |
|
beamSearchKernels
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
causalConv1d
|
fix: fix license bug (#5200)
|
2025-06-13 18:58:15 +08:00 |
|
communicationKernels
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
|
2025-06-14 17:36:22 +08:00 |
|
contextFusedMultiHeadAttention
|
keep sm90 headsize 128 cubins (#5320)
|
2025-06-26 12:14:01 +08:00 |
|
cutlass_kernels
|
Fix : fix build for sm120 (#5265)
|
2025-06-27 20:42:47 +08:00 |
|
decoderMaskedMultiheadAttention
|
[chore] Allow configuring linking of NVRTC wrapper (#5189)
|
2025-06-26 07:26:10 +02:00 |
|
dsv3MinLatencyKernels
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
|
2025-06-14 17:36:22 +08:00 |
|
flashMLA
|
fix: fix license bug (#5200)
|
2025-06-13 18:58:15 +08:00 |
|
fusedLayernormKernels
|
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
|
2025-06-26 12:18:19 +08:00 |
|
groupRmsNormKernels
|
feat: Add heuristic for GroupRMSNorm kernel selection. (#4047)
|
2025-05-13 08:52:53 +08:00 |
|
internal_cutlass_kernels
|
opensource: Opensource MOE MXFP8-MXFP4 implementation (#5222)
|
2025-06-26 12:18:19 +08:00 |
|
llama4MinLatencyKernels
|
[fix] Fix Llama4 guradwords failures (#4844)
|
2025-06-02 13:43:42 -07:00 |
|
lora
|
chore: Stabilize ABI boundary for internal kernel library (#3117)
|
2025-04-11 15:07:50 +08:00 |
|
moeLoadBalance
|
feat: Misc Opt for large scale EP (#5374)
|
2025-06-20 13:11:31 +08:00 |
|
selectiveScan
|
fix: fix license bug (#5200)
|
2025-06-13 18:58:15 +08:00 |
|
speculativeDecoding
|
fix: refactor and fix mtp vanilla (#4762)
|
2025-06-20 05:23:39 +08:00 |
|
trtllmGenKernels
|
Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) (#5519)
|
2025-06-27 20:17:40 +08:00 |
|
unfusedAttentionKernels
|
feat: Add Mixture of Experts FP8xMXFP4 support (#4750)
|
2025-06-09 13:25:04 +08:00 |
|
userbuffers
|
feat: Add Mixture of Experts FP8xMXFP4 support (#4750)
|
2025-06-09 13:25:04 +08:00 |
|
weightOnlyBatchedGemv
|
feat: Add FP8 support for SM 120 (#3248)
|
2025-04-14 16:05:41 -07:00 |
|
attentionMask.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
attentionMask.h
|
Update TensorRT-LLM (#2363)
|
2024-10-22 20:27:35 +08:00 |
|
banBadWords.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
banBadWords.h
|
Update TensorRT-LLM (#2008)
|
2024-07-23 23:05:09 +08:00 |
|
banRepeatNgram.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
banRepeatNgram.h
|
Update TensorRT-LLM (#1598)
|
2024-05-14 16:43:41 +08:00 |
|
beamSearchKernels.cu
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
beamSearchKernels.h
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
buildRelativeAttentionBiasKernel.cu
|
Update TensorRT-LLM (#1763)
|
2024-06-11 16:59:02 +08:00 |
|
buildRelativeAttentionBiasKernel.h
|
Update TensorRT-LLM (#1763)
|
2024-06-11 16:59:02 +08:00 |
|
CMakeLists.txt
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
|
2025-06-14 17:36:22 +08:00 |
|
cumsumLastDim.cu
|
open source 7f370deb0090d885d7518c2b146399ba3933c004 (#2273)
|
2024-09-30 13:51:19 +02:00 |
|
cumsumLastDim.h
|
Update TensorRT-LLM (#1725)
|
2024-06-04 20:26:32 +08:00 |
|
customAllReduceKernels.cu
|
Cherry pick feat/llama4 to main (#4739)
|
2025-05-30 05:28:40 +08:00 |
|
customAllReduceKernels.h
|
[TRTLLM-3927] [feat] Finalize + Allreduce + add + rmsnorm fusion (#4756)
|
2025-06-10 19:55:16 +08:00 |
|
decoderMaskedMultiheadAttention.cu
|
Update TensorRT-LLM (#2502)
|
2024-11-26 16:51:34 +08:00 |
|
decoderMaskedMultiheadAttention.h
|
[https://nvbugspro.nvidia.com/bug/5300080] Fix the bug of setting attention_chunk_size and enable chunked-attention in the generation-phase by default (#4693)
|
2025-06-03 19:02:57 -04:00 |
|
decoderMaskedMultiheadAttentionUtils.h
|
Update TensorRT-LLM (#2363)
|
2024-10-22 20:27:35 +08:00 |
|
decodingCommon.cu
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
decodingKernels.cu
|
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
|
2025-05-12 22:32:29 +02:00 |
|
decodingKernels.h
|
refactor: Improve decoder finalize function (#3077)
|
2025-03-28 14:33:59 +08:00 |
|
delayStream.cu
|
Update (#2978)
|
2025-03-23 16:39:35 +08:00 |
|
delayStream.h
|
Update (#2978)
|
2025-03-23 16:39:35 +08:00 |
|
doraScaling.cu
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
doraScaling.h
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
fmhaDispatcher.cpp
|
feat: chunked prefill for MLA (Blackwell) (#4651)
|
2025-06-26 09:01:00 +08:00 |
|
fmhaDispatcher.h
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
fusedQKNormRopeKernel.cu
|
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
|
2025-05-23 15:31:04 +08:00 |
|
fusedQKNormRopeKernel.h
|
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
|
2025-05-23 15:31:04 +08:00 |
|
gptKernels.cu
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
gptKernels.h
|
feat: add CGA reduction fmha kernels on Blackwell. (#3763)
|
2025-04-29 10:43:54 +08:00 |
|
groupGemm.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
groupGemm.h
|
Update TensorRT-LLM (#2562)
|
2024-12-11 00:31:05 -08:00 |
|
kvCachePartialCopy.cu
|
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017)
|
2025-06-09 17:50:57 +08:00 |
|
kvCacheUtils.h
|
Update TensorRT-LLM (#2582)
|
2024-12-16 21:50:47 -08:00 |
|
layernormKernels.cu
|
feat: Add support for fp8 rowwise quantization (#4876)
|
2025-06-14 06:37:48 -07:00 |
|
layernormKernels.h
|
feat: Add support for fp8 rowwise quantization (#4876)
|
2025-06-14 06:37:48 -07:00 |
|
logitsBitmask.cu
|
bitmask v3 (#3009)
|
2025-03-26 15:21:29 +08:00 |
|
logitsBitmask.h
|
Update TensorRT-LLM (#2532)
|
2024-12-04 21:16:56 +08:00 |
|
lookupKernels.cu
|
Update TensorRT-LLM (#1639)
|
2024-05-21 17:51:02 +08:00 |
|
lookupKernels.h
|
Update TensorRT-LLM (#1639)
|
2024-05-21 17:51:02 +08:00 |
|
lruKernel.cu
|
Update TensorRT-LLM (#1688)
|
2024-05-28 20:07:49 +08:00 |
|
lruKernel.h
|
Update TensorRT-LLM (#1688)
|
2024-05-28 20:07:49 +08:00 |
|
mambaConv1dKernels.cu
|
feat: Add FP8 support for SM 120 (#3248)
|
2025-04-14 16:05:41 -07:00 |
|
mambaConv1dKernels.h
|
Update TensorRT-LLM (#1954)
|
2024-07-16 15:30:25 +08:00 |
|
mlaChunkedPrefill.cu
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
|
2025-06-26 22:18:08 +08:00 |
|
mlaChunkedPrefill.cuh
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
|
2025-06-26 22:18:08 +08:00 |
|
mlaKernels.cu
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
|
2025-06-26 22:18:08 +08:00 |
|
mlaKernels.h
|
feat: chunked prefill for MLA (Blackwell) (#4651)
|
2025-06-26 09:01:00 +08:00 |
|
moeCommKernels.cu
|
optimize memset before alltoall communication (#5188)
|
2025-06-14 10:49:47 +08:00 |
|
moeCommKernels.h
|
feat: Add MNNVL MoE A2A support (#3504)
|
2025-04-25 17:29:08 +08:00 |
|
multiHeadAttentionCommon.h
|
chore: Stabilize ABI boundary for internal kernel library (#3117)
|
2025-04-11 15:07:50 +08:00 |
|
noAuxTcKernels.cu
|
Update (#2978)
|
2025-03-23 16:39:35 +08:00 |
|
noAuxTcKernels.h
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
penaltyKernels.cu
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
penaltyKernels.h
|
Update TensorRT-LLM (#2502)
|
2024-11-26 16:51:34 +08:00 |
|
penaltyTypes.h
|
Update TensorRT-LLM (#1554)
|
2024-05-07 23:34:28 +08:00 |
|
preQuantScaleKernel.cu
|
chore: Mass integration of release/0.20. (#4871)
|
2025-06-04 14:12:27 +08:00 |
|
preQuantScaleKernel.h
|
chore: Mass integration of release/0.20. (#4871)
|
2025-06-04 14:12:27 +08:00 |
|
qserveGemm.h
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
qserveGemmPerChannel.cu
|
Update TensorRT-LLM (#2532)
|
2024-12-04 21:16:56 +08:00 |
|
qserveGemmPerGroup.cu
|
Update TensorRT-LLM (#2502)
|
2024-11-26 16:51:34 +08:00 |
|
quantization.cu
|
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
|
2025-06-26 14:03:56 +08:00 |
|
quantization.cuh
|
feat: Add Mixture of Experts FP8xMXFP4 support (#4750)
|
2025-06-09 13:25:04 +08:00 |
|
quantization.h
|
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
|
2025-06-26 14:03:56 +08:00 |
|
recoverFromRingAtten.cu
|
Support RingAttention in the BertAttention plugin and the DiT model (#3661)
|
2025-05-09 08:06:54 +08:00 |
|
recoverFromRingAtten.h
|
Support RingAttention in the BertAttention plugin and the DiT model (#3661)
|
2025-05-09 08:06:54 +08:00 |
|
renormMoeRoutingKernels.cu
|
Add customized renormalized moe routing kernel for moe cutlass backend (#4955)
|
2025-06-09 17:38:50 +08:00 |
|
renormMoeRoutingKernels.h
|
Add customized renormalized moe routing kernel for moe cutlass backend (#4955)
|
2025-06-09 17:38:50 +08:00 |
|
rmsnormKernels.cu
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
rmsnormKernels.h
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
sageAttentionKernels.cu
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
sageAttentionKernels.h
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
samplingAirTopPKernels.cu
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
samplingTopKKernels.cu
|
Update TensorRT-LLM (#2849)
|
2025-03-04 18:44:00 +08:00 |
|
samplingTopKKernels.h
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
samplingTopPKernels.cu
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
samplingTopPKernels.h
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
splitkGroupGemm.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
splitkGroupGemm.h
|
Update TensorRT-LLM (#2792)
|
2025-02-18 21:27:39 +08:00 |
|
stopCriteriaKernels.cu
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
stopCriteriaKernels.h
|
open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 (#2297)
|
2024-10-08 12:19:19 +02:00 |
|
topkLastDim.cu
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
topkLastDim.h
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
unfusedAttentionKernels.cu
|
fix: fix for cp > kvHeadNum (#3002)
|
2025-03-26 12:39:02 +08:00 |
|
unfusedAttentionKernels.h
|
fix: fix for cp > kvHeadNum (#3002)
|
2025-03-26 12:39:02 +08:00 |
|
xqaDispatcher.cpp
|
[feat] Support XQA-based MLA on SM120 (#4858)
|
2025-06-06 22:32:49 +08:00 |
|
xqaDispatcher.h
|
[feat] Support XQA-based MLA on SM120 (#4858)
|
2025-06-06 22:32:49 +08:00 |