TensorRT-LLMs/tensorrt_llm/_torch/auto_deploy
Neta Zmora 34dc6869f3
[#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011)
Update TRTLLM Cutlass MoE kernels with ReLU2 activation.

Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function.
The PR adds this and adds an API to set the activation function, in general.
The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954.

The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from
Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`).

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 16:54:45 -08:00
..
compile [https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup (#8658) 2025-10-28 10:52:43 -07:00
config [None][autodeploy] minor refactor to rmsnorm transforms (#8657) 2025-11-13 13:13:58 -08:00
custom_ops [#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011) 2025-11-13 16:54:45 -08:00
distributed [None][fix] Switch AD AllReduce strategy to NCCL (#8979) 2025-11-07 06:49:44 +02:00
export [#8924][fix] Fix AutoDeploy pattern matcher for torch 2.9 (#8920) 2025-11-05 13:29:20 -08:00
models [None][fix] AutoDeploy: update nano3 accuracy test (#9061) 2025-11-11 12:26:31 -08:00
shim [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00
transform [#8732][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011) 2025-11-13 16:54:45 -08:00
utils [None][autodeploy] fix weight extraction for graph based quantized checkpoints (#9109) 2025-11-13 13:14:24 -08:00
__init__.py [AutoDeploy] merge feat/ad-2025-07-07 (#6196) 2025-07-23 05:11:04 +08:00
llm_args.py [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00
llm.py [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00