TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Guoming Zhang	57079cecb3	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-22 10:05:47 -07:00
QI JUN	68b7900a1d	[https://nvbugs/5531963 ][fix] cherry pick #7725 (#7907 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-22 06:55:05 -07:00
Yan Chunwei	bc4136ffe7	[https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens (#7718 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-22 03:37:48 -07:00
Yan Chunwei	ce6ebf695c	[None][fix] api stability bug in status label (#7861 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-21 23:17:26 -07:00
Yan Chunwei	8fecc0645d	[None][doc] add stable label to all the un-labelled arguments in LLM class (#7863 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-21 22:47:36 -07:00
Guoming Zhang	af3ea37176	[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-19 22:05:42 +08:00
Yan Chunwei	2f3e3ae465	[https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case (#7717 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-18 03:35:16 +08:00
Tao Li @ NVIDIA	015e149211	[https://nvbugs/1234567 ][fix] Revert https://github.com/NVIDIA/TensorRT-LLM/pull/7768/files (#7813 ) Signed-off-by: Tao Li	2025-09-18 03:34:05 +08:00
Yukun He	88fe78e0af	[https://nvbugs/5517023 ][fix] Pass allreduce strategy and force NCCL on pre-Blackwell arch (#7768 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-17 13:28:43 +08:00
Guoming Zhang	2d7af4b32c	[https://nvbugs/5468897 ][fix] fix invalid expression for disabling pa… (#7762 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-17 11:14:54 +08:00
Yan Chunwei	b940ebf4e3	[None][doc] Enhance api reference doc by labeling stable APIs (#7751 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-17 10:20:26 +08:00
Yi Zhang	7df515e335	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-09-16 10:33:35 +08:00
Yilin Fan	e5ba99c6de	[https://nvbugs/5398180 ][feat] Improve Llama4 performance for small max_seqlen cases (#7681 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-09-15 09:04:32 +08:00
Guoming Zhang	541fd3ecb8	[https://nvbugs/5474409 ][fix] Disable concurrent loading by default (#7663 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-11 00:11:17 +08:00
WeiHaocheng	68b7bad447	[https://nvbugs/5477730 ][fix] Fix the alltoall case when tp_size larg… (#7671 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-09-10 20:21:09 +08:00
HuiGao-NV	5206f1ce47	[https://nvbugs/5474169 ][fix] seq_len mismatch between kv cache manager and graph attn metadata (#7606 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-09 08:32:31 +08:00
Yukun He	e07fa9ddc5	[https://nvbugs/5496960 ][fix] Fix Gemma model forward. (#7509 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-04 19:09:43 +08:00
Guoming Zhang	cabda243f1	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-04 05:29:43 -04:00
dongxuy04	9eecdf2ee9	[TRTLLM-7008][fix] cherrypick fix to 1.0 Add automatic shared memory delete if already exist (#7433 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-02 11:23:53 +08:00
Yuxian Qiu	559762f185	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-01 16:00:43 +08:00
HuiGao-NV	860589aa0c	[https://nvbugs/5474169 ][fix]Adjust max seq len for kvcache for memory estimation (#7391 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-01 14:40:58 +08:00
Chang Liu	050db0e46f	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) (#7379 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-30 17:44:24 +08:00
Bo Li	ef0f65b353	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-29 12:19:03 +08:00
amitz-nv	66f0657716	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7203 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-28 16:06:32 +08:00
Venky	6cc168a5d3	[https://nvbugs/5463720 ][fix] tp-split the inferred `mlp_hidden_size` for nemotron-nas (#7231 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-27 15:04:42 +03:00
Lizhi Zhou	0fa49c5e2b	[https://nvbugs/5448767 ][fix] fix mpi4py deadlocks in pp event-loop (#6976 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-27 02:01:48 -04:00
Jin Li	877e1f44d3	[https://nvbugs/5451426 ][fix] Avoid torch compile on full eagle3 worker (#7245 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-27 09:59:06 +08:00
William Zhang	34c1e9c341	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-26 09:40:17 -07:00
Jiagan Cheng	85b4ae26b7	[https://nvbugs/5451342 ][fix] Use runtime max_batch_size when cuda_graph_config.max_batch_size is not provided in trtllm-bench (#7031 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-08-26 08:10:35 -04:00
Yuxian Qiu	2fb16ad328	[None][fix] fix log_once usage (#7210 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-26 19:13:03 +08:00
Shi Xiaowei	d010b2043a	[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#7191 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-25 20:21:43 +08:00
Wanli Jiang	b76c987913	[https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value (#7168 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-25 15:37:57 +08:00
peaceh-nv	030598a497	[https://nvbugs/5448426 ][fix] Fix illegal memory access in cuda graph (#7127 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-25 10:04:34 +08:00
Wanli Jiang	036c3dd0ea	[TRTLLM-6825][fix] Update lora for phi4-mm (#7149 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-23 20:57:00 +08:00
Dom Brown	3f2eb4d2e8	[https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-23 05:26:12 -04:00
Shi Xiaowei	3ee8523829	[https://nvbugs/5450074 ][fix] Reduce the device memory requirements for testing (#6990 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-22 17:33:30 +08:00
HuiGao-NV	253af9f9af	[https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta (#6557 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-08-22 13:19:27 +08:00
Pamela Peng	1e5a6be55d	[https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 (#7010 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-08-21 13:34:07 -04:00
Venky	9eac744d72	[https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-21 08:28:54 -04:00
ChristinaZ	a875e50321	[https://nvbugs/5392414 ] [fix] For release 1.0 cherry pick. Add customized default routing method (#7068 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-21 20:06:50 +08:00
Yan Chunwei	caf73f5bab	[https://nvbugs/5383702 ][fix] test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus (#6889 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:56:42 +08:00
Yan Chunwei	e77ec061db	[https://nvbugs/5451296 ][fix] zmq nonblock bug with retry (#7019 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:34:46 +08:00
yifeizhang-c	5959d72d74	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6975 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-20 16:32:27 +08:00
Jin Li	69846c6586	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6978 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-20 15:06:56 +08:00
Yan Chunwei	fae43e7b46	[None][doc] add status labels to LLM class's api reference (#6899 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-19 21:50:04 -04:00
QI JUN	cd1b809d6e	[https://nvbugs/5374016 ][fix] improve error message (#6893 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-19 10:29:08 +08:00
Aurelien Chartier	fef2f1f55d	[https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 (#6913 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-19 10:25:43 +08:00
Liao Lanyu	d9b9b5d053	[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights (#6927 ) Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>	2025-08-18 10:20:09 +08:00
Venky	550faa9554	[https://nvbugs/5453667 ] [fix] reverting a breaking change: make trtllm-bench `enable_chunked_context` defaults backend-dependent (#6956 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-16 00:29:02 -04:00
Mike Iovine	9e02f6b9f4	[https://nvbugs/5455836 ][fix] Fix llama 4 FP4 (#6911 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-15 10:09:09 -04:00

1 2 3 4 5 ...

1027 Commits