TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

Author	SHA1	Message	Date
shuyixiong	c3cdc93211	[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2026-02-10 01:12:49 -05:00
Jonas Li	8b2dc57823	[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch (#11384 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2026-02-10 14:00:42 +08:00
Venky	0c8b5221b4	[TRTC-264][doc] Add CLAUDE.md and AGENTS.md (#11358 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-02-09 20:29:58 -08:00
Lucas Liebenwein	a2fb5afecf	[#11032 ][feat] MLA revisited and GLM 4.7 Flash support (#11324 )	2026-02-09 23:26:51 -05:00
Venky	d50f010fa9	[TRTC-265][chore] Add CODEOWNERS coverage for serve/ and commands/ directories (#11359 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-02-09 22:52:09 -05:00
Emma Qiao	85919d9517	[None][infra] Disable spark stages due to migration of spark cloud (#11401 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-09 22:31:09 -05:00
Yuan Tong	4fc3644705	[None][fix] Avoid reserved filename on Windows (#11382 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2026-02-10 11:22:59 +08:00
JennyLiu	b5508ed75b	[None][test] Add DGX-Spark multinode perf cases including eagle3 (#11184 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-02-10 10:44:41 +08:00
Mike Iovine	f33086914f	[https://nvbugs/5843112 ][chore] Unwaive ngram test (#11320 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-02-09 21:31:29 -05:00
Yuxian Qiu	af68c29d3d	[None][chore] Reduce attention module repeated warnings. (#11335 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 08:58:21 +08:00
Lucas Liebenwein	fe4c690b6c	[https://nvbugs/5855540 ][fix] AutoDeploy: thread cleanup of eagle test (#11289 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-09 18:01:12 -05:00
Ziyi Xiong	e76b634251	[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec (#10502 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-02-10 05:16:02 +08:00
Mike Iovine	092f4ce774	[https://nvbugs/5853997 ][chore] Unwaive gpt-oss test (#11287 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-02-09 16:04:41 -05:00
Patrice Castonguay	c68d916b6f	[None][chore] Unit test for disagg gen cancellation (#11108 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2026-02-09 14:39:02 -05:00
tcherckez-nvidia	ea81a03dd1	[None][chore] update model list (#11364 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-02-09 21:27:39 +02:00
Bala Marimuthu	4a743338c3	[None][infra] AutoDeploy: Dump graph IR after every transform (#11045 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-02-09 10:43:44 -08:00
Lizhi Zhou	e719721a60	[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 12:09:03 -05:00
Harris Nover	100bfdc516	[None][fix] Respect CUDA_LAUNCH_BLOCKING by fixing doCheckError (#11261 ) Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>	2026-02-09 11:49:56 -05:00
Guiju Zhang	c37531c3f7	[TRTLLM-10669][fix] Fix Eagle3 draft model weight loading for throughput checkpoint (#11010 ) Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Ivy Zhang	9384cf8458	[https://nvbugs/5839569 ][test] update test constraint (#11054 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Emma Qiao	03b635bb08	[None][infra] Waive failed case for release on 1/28 (#11055 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Lizhi Zhou	1524c172a4	[https://nvbugs/5821433 ][fix] WAR for popen in QA env (#10989 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Balaram Buddharaju	5f8b1b8cbb	[https://nvbugs/5811087 ][chore] Unwaive Gemma3 27B multimodal test (#11049 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Enwei Zhu	1ba039f044	[https://nvbugs/5819452 ][ci] Unwaive LLaMA2 7B FP8 case (#10997 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
William Zhang	abb8106c01	[https://nvbugs/5835925 ][fix] Add EPD disagg support for Qwen3 VL MoE (#10962 ) * Why? Trying to instantiate a `MultimodalEncoder` for a Qwen3 VL MoE model would fail during weight loading. * What? This commit fixes the bug, alongside: - explicit, intentional support for EPD for Qwen3 VL MoE. - extends EPD unit tests for Qwen3 VL MoE, albeit with dummy weights. - unit tests for the weight mapper fixes. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Jin Li	0ead17bb85	[https://nvbugs/5800646 ][fix] Fix hang issue by avoid exposing UB buf… (#10842 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
yingguo-trt	d348dd95a7	[None][feat] support Lyris GB200 and increase disagg test timeout (#11019 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
yufeiwu-nv	fd4e6132e5	[None][test] Fix missing test cases (#10881 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Stefan Niebler	d50010cd1f	[https://nvbugs/5769815 ][fix] Fix offset calculation in _are_stop_words when using speculative decoding (#10854 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Lizhi Zhou	6c4e0c3dbe	[https://nvbugs/5826689 ][fix] replace etcd3 with etcd-sdk-python (#10886 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Emma Qiao	c659280445	[None][infra] Waive failed cases for release branch on 01/26 (#10999 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Pengbo Wang	59f59efb83	[https://nvbugs/5779536 ][fix] Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
JunyiXu-nv	90ea6c1e09	[https://nvbugs/5804146 ][fix] Enable responses tests and remove ds to… (#10925 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
mpikulski	196d94a419	[TRTLLM-10030][perf] avoid syncs in beam search + other improvements (#11349 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-09 16:13:58 +01:00
Gal Hubara-Agam	2b60cc181c	[#10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE (#11322 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-09 10:07:37 -05:00
Lizhi Zhou	540fb0f29e	[https://nvbugs/5834212 ][chore] unwaive test_disaggregated_mixed (#11372 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 09:16:25 -05:00
Robin Kobus	b3e4ddc953	[None][test] Enhance multi-GPU tests for IFB stats (#11239 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:25:32 +08:00
Robin Kobus	31db399042	[https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver (#11354 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:11:45 +08:00
Bo Li	ab73f6ebc6	[None][chore] Add microbench for MoE Comm methods. (#10317 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-02-09 02:57:01 -05:00
Yihan Wang	635d65f9fe	[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch (#11168 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-02-09 13:57:57 +08:00
Emma Qiao	ad8f6748a3	[None][infra] Waive failed case for main branch on 02/09 (#11369 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-08 23:05:33 -05:00
TensorRT LLM	fe9192f120	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-09 03:16:42 +00:00
Yanchao Lu	b464c75056	[None][ci] Waive test failures on main 02/08 (#11365 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-08 22:50:37 +08:00
TensorRT LLM	f7cf25748b	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-08 03:10:28 +00:00
mpikulski	03b38e9fbf	[TRTLLM-10030][perf] avoid sync in PyTorchModelEngine when using beam search (#11341 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-07 12:31:11 +08:00
William Zhang	ffc0f54959	[https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 ) * Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-06 22:37:42 -05:00
TensorRT LLM	408d610877	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-07 03:13:33 +00:00
Iman Tabrizian	18e611da77	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-06 14:23:51 -05:00
Gal Hubara-Agam	f9eed3ecc2	[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds (#11107 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-06 14:55:18 +02:00
Shi Xiaowei	b1268e1b37	[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-02-06 07:15:18 -05:00

1 2 3 4 5 ...

5055 Commits