TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 11:42:41 +08:00

History

Yukun He 9c5b464fe0 [None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113 ) Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-08-25 10:48:31 +08:00
..
__init__.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
cpp_custom_ops.py	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 )	2025-08-24 08:15:29 -04:00
flashinfer_custom_ops.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
torch_custom_ops.py	[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113 )	2025-08-25 10:48:31 +08:00
trtllm_gen_custom_ops.py	[None][perf] Make finalize fusion part of the tactic selection logic (#6915 )	2025-08-21 14:08:03 -07:00
userbuffers_custom_ops.py	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00