TensorRT-LLMs/cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe
Enwei Zhu 7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
..
CMakeLists.txt
DevKernel.cu [None][feat] TRT-LLM Gen MoE optimize DeepSeek Fp8 activation kernel (#9175) 2025-11-21 15:35:00 +01:00
DevKernel.h [None][feat] TRT-LLM Gen MoE optimize DeepSeek Fp8 activation kernel (#9175) 2025-11-21 15:35:00 +01:00
IntFastDiv.h
RoutingDeepSeek.cu [TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592) 2025-12-05 22:08:52 -08:00
RoutingKernel.cuh [TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880) 2025-11-18 17:40:12 -08:00
RoutingKernel.h [TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880) 2025-11-18 17:40:12 -08:00
RoutingKernelTopK.cuh
RoutingLlama4.cu [TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880) 2025-11-18 17:40:12 -08:00
RoutingRenormalize.cu [TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880) 2025-11-18 17:40:12 -08:00
runner.cu [TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880) 2025-11-18 17:40:12 -08:00
runner.h [None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025) 2025-11-17 10:04:29 +08:00