TensorRT-LLMs/cpp/tensorrt_llm/thop
Enwei Zhu 6fe89ea00f
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output (#9840)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-18 10:36:38 -08:00
..
allgatherOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
allreduceOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
alltoallOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
attentionOp.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
causalConv1dOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
CMakeLists.txt [TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586) 2025-11-25 09:40:55 -05:00
convertSpecDecodingMaskToPackedMaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasFp4ScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasScaledMM.cpp [None][feat] spark cublas LUT table for llama-8b-bf16 perf (#9811) 2025-12-12 22:37:56 -05:00
cublasScaledMM.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasScaledMMLut.h [None][chore] Add namespace to header to fix tot failure (#9973) 2025-12-13 12:18:10 -05:00
cudaNvfp4MM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cudaScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cuteDslMoeUtilsOp.cpp [TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output (#9840) 2025-12-18 10:36:38 -08:00
cutlassScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3FusedAGemmOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3RopeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3RouterGemmOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dynamicDecodeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dynamicDecodeOp.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
finegrained_mixed_dtype_gemm_thop.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
finegrained_mixed_dtype_gemm_thop.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaPackMaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4BatchedQuantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4BlockScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
fp4Gemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4GemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Op.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Quantize.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4xFp8GemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8BatchedGemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8BlockScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
fp8BlockScalingGemm.cpp [https://nvbugs/5456493][feat] Add fp8 bmm on sm120 (#9687) 2025-12-18 22:57:20 +08:00
fp8Op.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8Op.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8PerTensorScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
fp8PerTensorScalingTrtllmGenGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8RowwiseGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fusedQKNormRopeOp.cpp [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
fusedTopkSoftmax.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
gatherTreeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupRmsNormOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
helixPostProcessOp.cpp [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
IndexerKCacheScatterOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerTopKOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
llama4MinLatency.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
logitsBitmaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
loraOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaPreprocessOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlltoAllMeta.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlltoAllOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeCommOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeLoadBalanceOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeUtilOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mxFp4BlockScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
mxFp8Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
ncclCommunicatorOp.cpp [None][feat] Async pp send for PPCommTorch. (#9976) 2025-12-15 14:03:46 +08:00
ncclCommunicatorOp.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
noAuxTcOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
parallelDecodeKVCacheUpdateOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
redrafterCurandOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
reducescatterOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
relativeAttentionBiasOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
selectiveScanOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
specDecOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
thUtils.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
thUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
tinygemm2.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersFinalizeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersTensor.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersTensor.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
virtualMemoryAllocator.cpp [TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034) 2025-08-04 13:51:01 +08:00
weightOnlyQuantGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyQuantGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyQuantOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00