TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yukun He 9c5b464fe0 [None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113 ) Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-08-25 10:48:31 +08:00
..
fused_moe	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 )	2025-08-24 08:15:29 -04:00
mamba	[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334 )	2025-08-22 12:15:20 -04:00
__init__.py	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
attention.py	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 )	2025-08-19 22:04:48 +08:00
decoder_layer.py	chore: Change the type annotations of input_ids and position_ids to int32. (#4632 )	2025-06-07 16:10:47 +08:00
embedding.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
gated_mlp.py	[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616 )	2025-08-08 15:03:48 +08:00
layer_norm.py	[#6187 ][feat] add LayerNorm module (#6625 )	2025-08-12 21:43:30 +02:00
linear.py	[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113 )	2025-08-25 10:48:31 +08:00
logits_processor.py	feat: LogitsProcessor in PyTorch backend (#3145 )	2025-05-01 14:15:30 -07:00
mlp.py	feat: add LLmArgs option to force using dynamic quantization (#5346 )	2025-07-01 12:16:09 -07:00
multi_stream_utils.py	[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615 )	2025-08-19 09:58:44 +08:00
rms_norm.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
rotary_embedding.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
swiglu.py	[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead (#6540 )	2025-08-06 10:42:19 +08:00
triton_linear.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00