TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-24 04:33:04 +08:00

History

Neta Zmora 53491ffdb1 [#9023 ][feat] reduce AD graph optimization time for non-participating passes (#9024 ) Shorten AD graph optimization by 30% (measured on Nemotron-6): A bug in the transformation interface marked all passes as not clean, regardless of what was reported by the transformation Fix how the optimization passes report the results of their actions. Many passes report that the graph is not clean even when they didn't participate in the optimization. Each graph cleaning invocation can take several seconds. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>		2025-11-12 09:05:53 -08:00
..
attention_backend	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 )	2025-11-11 12:33:30 +08:00
auto_deploy	[#9023 ][feat] reduce AD graph optimization time for non-participating passes (#9024 )	2025-11-12 09:05:53 -08:00
compilation	[https://nvbugs/5550409 ][fix] Disable torch compile in piecewise attention part to Avoid host overhead (#8708 )	2025-10-29 18:12:58 +08:00
configs	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 )	2025-10-24 13:40:41 -04:00
custom_ops	[None][feat] Add customized topk and related unit tests for DSA (#8882 )	2025-11-10 03:35:35 -08:00
cute_dsl_kernels	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 )	2025-09-18 21:20:04 +08:00
debug	Add debug hook to support dump tensor data and add new debug functions easily (#5182 )	2025-06-24 17:45:28 +08:00
distributed	[None][feat] MNNVLAllreduce Kernel Refactor (#8018 )	2025-11-05 08:49:47 +08:00
models	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 )	2025-11-11 07:48:23 -08:00
modules	[TRTLLM-9259][perf] Use torch.compile to fuse copy + layernorm within the LayerNorm module (#9052 )	2025-11-11 18:11:00 -08:00
peft	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033 )	2025-08-25 10:37:40 +03:00
pyexecutor	[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572 )	2025-11-11 10:13:45 -08:00
shared_tensor	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
speculative	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 )	2025-11-07 09:01:15 +01:00
__init__.py	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
autotuner.py	[https://nvbugs/5623960 ][fix] Fix the logger once key issue and further compress log in AutoTuner. (#8873 )	2025-11-05 15:25:43 +08:00
cublaslt_utils.py	[https://nvbugs/5451205 ][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943 )	2025-10-23 15:55:10 +08:00
cute_dsl_utils.py	[None][chore] polish error message in cute_dsl_utils.py (#7852 )	2025-09-19 12:05:11 +08:00
device_mesh.py	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
expert_statistic.py	Add MTP support for Online EPLB (#5213 )	2025-06-25 07:58:13 +08:00
flashinfer_utils.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
hostfunc.py	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 )	2025-09-03 15:16:11 -07:00
llm.py	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
memory_buffer_utils.py	[TRTLLM-8690][feat] add more tensors to share buffers (#8691 )	2025-11-03 21:08:01 -08:00
metadata.py	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 )	2025-08-19 22:04:48 +08:00
model_config.py	[https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json (#8617 )	2025-10-31 04:41:44 -07:00
utils.py	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 )	2025-11-11 12:33:30 +08:00
virtual_memory.py	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 )	2025-11-04 10:19:24 -08:00