TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

2ez4bz cf0c47ca2d [None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>		2025-09-01 11:02:31 +08:00
..
attention	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
auto_deploy	[https://nvbugs/5474453 ][fix] fix path to tested model (#7272 )	2025-08-28 08:01:48 -04:00
compilation	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
debugger	Fix: fix nvbug 5356427 (#5464 )	2025-06-25 22:24:26 +08:00
executor	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 )	2025-08-27 11:16:12 +08:00
misc	[None][perf] Make finalize fusion part of the tactic selection logic (#6915 )	2025-08-21 14:08:03 -07:00
modeling	[None][fix] Fix batching bug in Mistral3 model (#6841 )	2025-09-01 11:02:31 +08:00
models/checkpoints/hf	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 )	2025-08-25 23:56:21 -04:00
modules	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033 )	2025-08-25 10:37:40 +03:00
multi_gpu	[None][ci] move unittests to sub-directories (#6635 )	2025-08-20 05:42:22 -04:00
multi_gpu_modeling	[None][fix] Fix llama4 multimodal by skipping request validation (#6957 )	2025-08-20 21:58:53 -04:00
multimodal	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 )	2025-08-19 21:42:50 -07:00
sampler	[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867 )	2025-08-22 08:09:30 +02:00
speculative	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 )	2025-08-27 00:45:58 -04:00
thop	[TRTLLM-7457][ci] Update unittest parallel config (#7297 )	2025-08-29 09:28:04 +08:00
helpers.py	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 )	2025-08-25 20:52:05 +08:00
pattern_watcher.py	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
test_connector.py	[None][feat] KV Cache Connector API (#7228 )	2025-08-28 23:09:27 -04:00