TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 14:44:24 +08:00

History

2ez4bz 7ebb770dce [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-08-14 02:15:44 -04:00
..
attention_backend	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 )	2025-08-05 07:47:41 +00:00
auto_deploy	[None][opt] ADP schedule balance optimization (#6061 )	2025-08-06 09:38:02 +08:00
compilation	[https://nvbugs/5252313 ][fix] Fix torch compile + MTP (#6554 )	2025-08-05 10:31:29 -04:00
custom_ops	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 )	2025-08-01 07:38:06 -04:00
debug	Add debug hook to support dump tensor data and add new debug functions easily (#5182 )	2025-06-24 17:45:28 +08:00
distributed	[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237 )	2025-07-25 08:01:40 +08:00
models	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-08-14 02:15:44 -04:00
modules	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 )	2025-08-06 16:44:21 +08:00
peft	feat: support multi lora adapters and TP (#3885 )	2025-05-08 23:45:45 +08:00
pyexecutor	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6786 )	2025-08-11 14:31:39 -04:00
shared_tensor	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
speculative	[https://nvbugs/5252313 ][fix] Fix torch compile + MTP (#6554 )	2025-08-05 10:31:29 -04:00
__init__.py	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
autotuner.py	chore: Improve the AutoTuner log information. (#6368 )	2025-08-01 09:19:52 +08:00
expert_statistic.py	Add MTP support for Online EPLB (#5213 )	2025-06-25 07:58:13 +08:00
llm.py	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
metadata.py	feat: no-cache attention in PyTorch workflow (#3085 )	2025-04-05 01:54:32 +08:00
model_config.py	Bugfix/fix nemotron nas lora support (#6380 )	2025-07-31 13:39:35 -04:00
utils.py	[fix] Fix perf regression caused by MoE autotuner when using DeepEPLowLatency (#6288 )	2025-07-28 01:37:11 -04:00
virtual_memory.py	[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 )	2025-08-04 13:51:01 +08:00