TensorRT-LLMs/tests/scripts/cute_dsl_kernels
ZhichenJiang 46e4af5688
[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201)
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 09:04:20 -05:00
..
run_blockscaled_contiguous_grouped_gemm_finalize_fusion.py [TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201) 2025-12-25 09:04:20 -05:00
run_blockscaled_contiguous_grouped_gemm.py [TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201) 2025-12-25 09:04:20 -05:00
run_dense_blockscaled_gemm_persistent.py [TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (#9428) 2025-12-01 18:10:45 +08:00
testing.py [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00