TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yanchao Lu	2cb5b9f31b	[None][ci] Increase the number of retries in docker image generation (#7557 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-06 18:16:36 +08:00
Yanchao Lu	275a09d0a2	Revert "[https://nvbugs/5461761 ][fix] Remove the waiver (#7427 )" This reverts commit `4612906b67`.	2025-09-06 18:11:34 +08:00
Guoming Zhang	01c4ece911	[None][doc] Rename TensorRT-LLM to TensorRT LLM. (#7554 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-05 16:54:57 +08:00
Guoming Zhang	f9187b2fda	[None][doc] Update kvcache part (#7549 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-05 03:46:13 -04:00
Yukun He	e07fa9ddc5	[https://nvbugs/5496960 ][fix] Fix Gemma model forward. (#7509 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-04 19:09:43 +08:00
Guoming Zhang	cabda243f1	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-04 05:29:43 -04:00
Ziyi Xiong	4612906b67	[https://nvbugs/5461761 ][fix] Remove the waiver (#7427 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-04 11:34:25 +08:00
Yan Chunwei	ad80819ef0	[https://nvbugs/5351244 ][fix] test_mpi_session (#7501 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-04 10:10:43 +08:00
dongxuy04	9eecdf2ee9	[TRTLLM-7008][fix] cherrypick fix to 1.0 Add automatic shared memory delete if already exist (#7433 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-02 11:23:53 +08:00
Guoming Zhang	95e0318647	[None][doc] add blackwell information into support matrix (#6740 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-01 14:04:45 -04:00
Emma Qiao	991b83af81	[None][infra] Waive failed tests on release branch 0901 (#7448 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:24:51 +08:00
Yuxian Qiu	559762f185	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-01 16:00:43 +08:00
HuiGao-NV	860589aa0c	[https://nvbugs/5474169 ][fix]Adjust max seq len for kvcache for memory estimation (#7391 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-01 14:40:58 +08:00
Chang Liu	050db0e46f	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) (#7379 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-30 17:44:24 +08:00
Lizhi Zhou	7e4dad4dbb	[https://nvbugs/5448767 ][fix] disable kv cache reuse for disagg pp>1 tests (#7354 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-29 09:33:16 +02:00
Bo Li	ef0f65b353	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-29 12:19:03 +08:00
amitz-nv	66f0657716	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7203 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-28 16:06:32 +08:00
Wanli Jiang	4ae40cbacf	[https://nvbugs/5480415 ][fix] Fix phi4mm multi-gpu test (#7275 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-27 22:24:19 -04:00
Iman Tabrizian	91c4af3f01	[https://nvbugs/5434320 ][bug] Fix disagg pp bug (#7099 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-27 13:38:01 -04:00
brb-nv	4b1898e82e	[https://nvbugs/5480550 ][fix] Increase timeout for Gemma3 27B test (#7271 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-27 08:45:05 -07:00
Venky	6cc168a5d3	[https://nvbugs/5463720 ][fix] tp-split the inferred `mlp_hidden_size` for nemotron-nas (#7231 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-27 15:04:42 +03:00
Lizhi Zhou	0fa49c5e2b	[https://nvbugs/5448767 ][fix] fix mpi4py deadlocks in pp event-loop (#6976 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-27 02:01:48 -04:00
Jin Li	877e1f44d3	[https://nvbugs/5451426 ][fix] Avoid torch compile on full eagle3 worker (#7245 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-27 09:59:06 +08:00
brb-nv	201fd257cc	[https://nvbugs/5478151 ][fix] Add missing spec for Llama-3.3 70B (#7267 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 09:56:58 +08:00
William Zhang	b6eba85dfc	[https://nvbugs/5430125 ][ci] Unwaive test case for mistral 3.1 small (#7265 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-26 17:32:02 -04:00
William Zhang	34c1e9c341	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-26 09:40:17 -07:00
Jiagan Cheng	85b4ae26b7	[https://nvbugs/5451342 ][fix] Use runtime max_batch_size when cuda_graph_config.max_batch_size is not provided in trtllm-bench (#7031 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-08-26 08:10:35 -04:00
Emma Qiao	7409d56053	[None][infra] Waive failed cases for release/1.0 (#7258 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-26 19:50:28 +08:00
Yuxian Qiu	2fb16ad328	[None][fix] fix log_once usage (#7210 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-26 19:13:03 +08:00
HuiGao-NV	df80b1e128	[https://nvbugs/5473789 ][bug] install cuda-toolkit to fix sanity check (#7159 ) Signed-off-by: Hui Gao <huig@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-26 18:51:21 +08:00
Linda	b394c51c99	[https://nvbugs/5409416 ][fix] test_openai_multi_chat_example (#7174 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-08-26 11:33:58 +02:00
Ivy Zhang	1f7a1645d6	[None][fix] update skip case (#7193 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 12:31:48 +08:00
ruodil	ebbbacf81c	[None][test] add kv cache size in bench metric and fix failed cases (#7211 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-26 10:09:22 +08:00
pcastonguay	9d2b181e7d	[https://nvbugs/5470840 ][fix] Disaggregated unit test MPI Init handling (#7139 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-25 19:48:38 -04:00
Shi Xiaowei	d010b2043a	[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#7191 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-25 20:21:43 +08:00
Yan Chunwei	5d165186d5	[None][doc] fix tensorrt legacy quickstart page (#7190 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-25 19:30:50 +08:00
Wanli Jiang	b76c987913	[https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value (#7168 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-25 15:37:57 +08:00
Guoming Zhang	01c5f2f233	[None][fix] Switch llm api quickstart example location per workflow. (#7182 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-24 22:17:20 -04:00
peaceh-nv	030598a497	[https://nvbugs/5448426 ][fix] Fix illegal memory access in cuda graph (#7127 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-25 10:04:34 +08:00
Wanli Jiang	036c3dd0ea	[TRTLLM-6825][fix] Update lora for phi4-mm (#7149 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-23 20:57:00 +08:00
Dom Brown	3f2eb4d2e8	[https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-23 05:26:12 -04:00
Michal Guzek	7ea53ff516	[https://nvbugs/5433545 ][fix] TestPhi4MiniInstruct::test_auto_dtype - Use max_seq_len=4096 to fallback to the short RoPE factor (#6895 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-08-22 10:28:09 -07:00
Dimitrios Bariamis	4b6cca0662	[https://nvbugs/5474037 ][fix] Fix building tritonbuild/tritonrelease images (#7157 ) Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2025-08-22 16:47:38 +02:00
Shi Xiaowei	3ee8523829	[https://nvbugs/5450074 ][fix] Reduce the device memory requirements for testing (#6990 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-22 17:33:30 +08:00
milesial	c1eefa8735	[https://nvbugs/5467062 ][fix] pass logitsPostProcessorBatched by reference (#7110 ) Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>	2025-08-22 04:42:06 -04:00
HuiGao-NV	253af9f9af	[https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta (#6557 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-08-22 13:19:27 +08:00
brb-nv	79f1e6c867	[https://nvbugs/5449032 ][fix] Add more llm-args to llm_mgmn_trtllm_bench.sh (#7144 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-22 12:24:35 +08:00
Pamela Peng	1e5a6be55d	[https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 (#7010 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-08-21 13:34:07 -04:00
Emma Qiao	441edf1eeb	[None][infra] Skip failed tests for release branch (#7130 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-22 00:09:51 +08:00
Venky	9eac744d72	[https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-21 08:28:54 -04:00

1 2 3 4 5 ...

2334 Commits