Commit Graph

6 Commits

Author SHA1 Message Date
Bo Li
bf1b958f1a
[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-25 16:52:30 -04:00
NVJiangShao
2f2f5cc72c
[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE (#6231)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-08 11:13:42 +08:00
Daniel Stokes
ec6c7dff1a
feat: Add support for MXFP8xMXFP4 in pytorch (#5535)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-06 15:32:06 -07:00
Li Min
16fc99391f
refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code (#5557)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-30 08:48:04 -07:00
Enwei Zhu
b4dab23e7b
[TRTLLM-5965] perf: Optimize MoE sort kernels for large-scale EP (#5435)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-30 01:02:07 +08:00
Li Min
6021a439ab
Make moe permute and final as custom op (#5412)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-27 15:48:33 -07:00