TensorRT-LLMs/cpp/tensorrt_llm/thop
benzh-2025 6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
..
allgatherOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
allreduceOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
alltoallOp.cpp [TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986) 2025-12-23 18:14:30 -08:00
attentionOp.cpp [TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740) 2025-12-27 00:07:20 +08:00
attentionOp.h [TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740) 2025-12-27 00:07:20 +08:00
causalConv1dOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
CMakeLists.txt [None][feat] add fp4 gemm + allreduce (#9729) 2026-01-13 21:11:13 +08:00
convertSpecDecodingMaskToPackedMaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasFp4ScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasScaledMM.cpp [None][feat] spark cublas LUT table for llama-8b-bf16 perf (#9811) 2025-12-12 22:37:56 -05:00
cublasScaledMM.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cublasScaledMMLut.h [None][chore] Add namespace to header to fix tot failure (#9973) 2025-12-13 12:18:10 -05:00
cudaNvfp4MM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cudaScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customMoeRoutingOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
cuteDslMoeUtilsOp.cpp [TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output (#9840) 2025-12-18 10:36:38 -08:00
cutlassScaledMM.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3FusedAGemmOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3RopeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dsv3RouterGemmOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dynamicDecodeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
dynamicDecodeOp.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
finegrained_mixed_dtype_gemm_thop.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
finegrained_mixed_dtype_gemm_thop.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fmhaPackMaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4BatchedQuantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4BlockScaleMoe.cpp [None][feat] Support nvfp4 for gptoss (#8956) 2026-01-04 08:57:44 -05:00
fp4Gemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4GemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Op.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4Quantize.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp4xFp8GemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8BatchedGemmTrtllmGen.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8BlockScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
fp8BlockScalingGemm.cpp [None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 (#9446) 2025-12-25 10:23:04 -05:00
fp8Op.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8Op.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8PerTensorScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
fp8PerTensorScalingTrtllmGenGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fp8RowwiseGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
fusedGemmAllreduceOp.cpp [None][feat] add fp4 gemm + allreduce (#9729) 2026-01-13 21:11:13 +08:00
fusedQKNormRopeOp.cpp [None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852) 2025-12-14 10:47:24 +08:00
fusedTopkSoftmax.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
gatherTreeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
groupRmsNormOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
helixPostProcessOp.cpp [TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924) 2025-12-12 16:49:25 -08:00
IndexerKCacheScatterOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
IndexerTopKOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
llama4MinLatency.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
logitsBitmaskOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
loraOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mambaConv1dOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mlaPreprocessOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlignOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlltoAllMeta.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlltoAllOp.cpp [None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822) 2025-12-22 05:57:12 -05:00
moeCommOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeLoadBalanceOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeOp.cpp [https://nvbugs/5726962][feat] Apply fusion for W4AFP8_AWQ MoE (#9838) 2026-01-06 10:16:41 +08:00
moeUtilOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mxFp4BlockScaleMoe.cpp [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792) 2025-12-15 19:59:08 -08:00
mxFp8Quantize.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
ncclCommunicatorOp.cpp [None][feat] Async pp send for PPCommTorch. (#9976) 2025-12-15 14:03:46 +08:00
ncclCommunicatorOp.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
noAuxTcOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
parallelDecodeKVCacheUpdateOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
redrafterCurandOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
reducescatterOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
relativeAttentionBiasOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
selectiveScanOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
specDecOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
thUtils.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
thUtils.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
tinygemm2.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersFinalizeOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersTensor.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
userbuffersTensor.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
virtualMemoryAllocator.cpp [TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034) 2025-08-04 13:51:01 +08:00
weightOnlyQuantGemm.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyQuantGemm.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
weightOnlyQuantOp.cpp [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00