TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

2ez4bz cf0c47ca2d [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>		2025-09-01 11:02:31 +08:00
..
attention_backend	[None][feat] Support NVFP4 KV Cache (#6244 )	2025-09-01 09:24:52 +08:00
auto_deploy	[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233 )	2025-08-26 10:47:57 -07:00
compilation	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 )	2025-08-26 18:31:33 -04:00
custom_ops	[None][chore] Wrap the swiglu into custom op to avoid redundant device copy. (#7021 )	2025-08-27 13:02:10 +08:00
debug	Add debug hook to support dump tensor data and add new debug functions easily (#5182 )	2025-06-24 17:45:28 +08:00
distributed	[https://nvbugs/5445466 ][fix] Bypass MLP TP split for MNNVL in DeepSeek V3 to avoid hanging. (#6886 )	2025-08-28 15:17:48 -07:00
models	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-09-01 11:02:31 +08:00
modules	[None][feat] Support NVFP4 KV Cache (#6244 )	2025-09-01 09:24:52 +08:00
peft	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033 )	2025-08-25 10:37:40 +03:00
pyexecutor	[None][feat] Support NVFP4 KV Cache (#6244 )	2025-09-01 09:24:52 +08:00
shared_tensor	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
speculative	[None][refactor] Move draft token padding out of Drafter (#7134 )	2025-08-27 11:07:50 +02:00
__init__.py	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
autotuner.py	[None][perf] Make finalize fusion part of the tactic selection logic (#6915 )	2025-08-21 14:08:03 -07:00
expert_statistic.py	Add MTP support for Online EPLB (#5213 )	2025-06-25 07:58:13 +08:00
flashinfer_utils.py	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
llm.py	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
metadata.py	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 )	2025-08-19 22:04:48 +08:00
model_config.py	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 )	2025-08-29 12:36:30 +08:00
utils.py	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 )	2025-08-26 18:31:33 -04:00
virtual_memory.py	[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 )	2025-08-04 13:51:01 +08:00