TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-30 07:33:48 +08:00

History

ZhichenJiang 46e4af5688 [TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>		2025-12-25 09:04:20 -05:00
..
run_blockscaled_contiguous_grouped_gemm_finalize_fusion.py	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 )	2025-12-25 09:04:20 -05:00
run_blockscaled_contiguous_grouped_gemm.py	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 )	2025-12-25 09:04:20 -05:00
run_dense_blockscaled_gemm_persistent.py	[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (#9428 )	2025-12-01 18:10:45 +08:00
testing.py	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 )	2025-12-13 08:35:31 +08:00