TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-17 00:04:57 +08:00

Author	SHA1	Message	Date
William Zhang	ca9537e17c	[TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264 ) * Why? Prior to this commit, we only supported a single multimodal input for E/P/D disaggregated serving. * What? This commit does a minor refactor of the multimodal embedding handles that cross process boundaries to enable this. Existing unit tests are updated accordingly to test this. The `RequestOutput` has its `mm_embedding_handle` replaced in favor of `disaggregated_params`, addressing a previous TODO. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-11 20:50:00 -08:00
xinhe-nv	42648734b8	[None][chore] Add failed cases into waives.txt (#11392 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-11 21:52:29 -05:00
Liao Lanyu	58165d5394	[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations (#11330 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-12 09:18:24 +08:00
Emma Qiao	8ebd6056fa	[None][infra] Waive failed cases for main on 2/11 (#11441 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-11 15:25:52 +08:00
Bo Li	5ea6888dda	[https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. (#11176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-02-11 00:54:40 -05:00
peihengh	a982554190	[https://nvbugs/5868038 ][fix] Gracefully terminate disagg serving servers to prevent leftover subprocess warnings (#11395 ) Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>	2026-02-10 22:41:37 -05:00
Iman Tabrizian	7d992972b2	[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ (#10540 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-10 07:20:56 -08:00
Yiqing Yan	cf02456613	[TRTLLM-9711][infra] Fix the testcase name in timeout xml (#9781 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-10 18:50:42 +08:00
xinhe-nv	c7689df152	[None][chore] Add failed cases into waives.txt (#11396 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-10 05:50:16 -05:00
xinhe-nv	6e0659dc4d	[None][chore] Add failed cases into waives.txt (#11363 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-10 05:48:33 -05:00
dominicshanshan	2a4e70b4a9	[None][chore] Unwaive tests after last MI (#11400 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-10 17:12:39 +08:00
Emma Qiao	8a74ccc57e	[None][infra] Waive failed cases for main branch on 02/10 (#11413 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-10 03:21:59 -05:00
Yuxian Qiu	5f4df89109	[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 15:43:28 +08:00
shuyixiong	c3cdc93211	[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2026-02-10 01:12:49 -05:00
Lucas Liebenwein	a2fb5afecf	[#11032 ][feat] MLA revisited and GLM 4.7 Flash support (#11324 )	2026-02-09 23:26:51 -05:00
JennyLiu	b5508ed75b	[None][test] Add DGX-Spark multinode perf cases including eagle3 (#11184 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-02-10 10:44:41 +08:00
Mike Iovine	f33086914f	[https://nvbugs/5843112 ][chore] Unwaive ngram test (#11320 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-02-09 21:31:29 -05:00
Lucas Liebenwein	fe4c690b6c	[https://nvbugs/5855540 ][fix] AutoDeploy: thread cleanup of eagle test (#11289 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-09 18:01:12 -05:00
Ziyi Xiong	e76b634251	[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec (#10502 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-02-10 05:16:02 +08:00
Mike Iovine	092f4ce774	[https://nvbugs/5853997 ][chore] Unwaive gpt-oss test (#11287 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-02-09 16:04:41 -05:00
Patrice Castonguay	c68d916b6f	[None][chore] Unit test for disagg gen cancellation (#11108 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2026-02-09 14:39:02 -05:00
Lizhi Zhou	e719721a60	[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 12:09:03 -05:00
Ivy Zhang	9384cf8458	[https://nvbugs/5839569 ][test] update test constraint (#11054 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Emma Qiao	03b635bb08	[None][infra] Waive failed case for release on 1/28 (#11055 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Lizhi Zhou	1524c172a4	[https://nvbugs/5821433 ][fix] WAR for popen in QA env (#10989 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Balaram Buddharaju	5f8b1b8cbb	[https://nvbugs/5811087 ][chore] Unwaive Gemma3 27B multimodal test (#11049 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Enwei Zhu	1ba039f044	[https://nvbugs/5819452 ][ci] Unwaive LLaMA2 7B FP8 case (#10997 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
William Zhang	abb8106c01	[https://nvbugs/5835925 ][fix] Add EPD disagg support for Qwen3 VL MoE (#10962 ) * Why? Trying to instantiate a `MultimodalEncoder` for a Qwen3 VL MoE model would fail during weight loading. * What? This commit fixes the bug, alongside: - explicit, intentional support for EPD for Qwen3 VL MoE. - extends EPD unit tests for Qwen3 VL MoE, albeit with dummy weights. - unit tests for the weight mapper fixes. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Jin Li	0ead17bb85	[https://nvbugs/5800646 ][fix] Fix hang issue by avoid exposing UB buf… (#10842 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
yingguo-trt	d348dd95a7	[None][feat] support Lyris GB200 and increase disagg test timeout (#11019 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
yufeiwu-nv	fd4e6132e5	[None][test] Fix missing test cases (#10881 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Stefan Niebler	d50010cd1f	[https://nvbugs/5769815 ][fix] Fix offset calculation in _are_stop_words when using speculative decoding (#10854 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Lizhi Zhou	6c4e0c3dbe	[https://nvbugs/5826689 ][fix] replace etcd3 with etcd-sdk-python (#10886 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Emma Qiao	c659280445	[None][infra] Waive failed cases for release branch on 01/26 (#10999 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
Pengbo Wang	59f59efb83	[https://nvbugs/5779536 ][fix] Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
JunyiXu-nv	90ea6c1e09	[https://nvbugs/5804146 ][fix] Enable responses tests and remove ds to… (#10925 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00
mpikulski	196d94a419	[TRTLLM-10030][perf] avoid syncs in beam search + other improvements (#11349 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-09 16:13:58 +01:00
Gal Hubara-Agam	2b60cc181c	[#10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE (#11322 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-09 10:07:37 -05:00
Lizhi Zhou	540fb0f29e	[https://nvbugs/5834212 ][chore] unwaive test_disaggregated_mixed (#11372 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 09:16:25 -05:00
Robin Kobus	b3e4ddc953	[None][test] Enhance multi-GPU tests for IFB stats (#11239 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:25:32 +08:00
Robin Kobus	31db399042	[https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver (#11354 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:11:45 +08:00
Bo Li	ab73f6ebc6	[None][chore] Add microbench for MoE Comm methods. (#10317 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-02-09 02:57:01 -05:00
Yihan Wang	635d65f9fe	[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch (#11168 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-02-09 13:57:57 +08:00
Emma Qiao	ad8f6748a3	[None][infra] Waive failed case for main branch on 02/09 (#11369 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-08 23:05:33 -05:00
Yanchao Lu	b464c75056	[None][ci] Waive test failures on main 02/08 (#11365 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-08 22:50:37 +08:00
William Zhang	ffc0f54959	[https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 ) * Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-06 22:37:42 -05:00
Iman Tabrizian	18e611da77	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-06 14:23:51 -05:00
Gal Hubara-Agam	f9eed3ecc2	[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds (#11107 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-06 14:55:18 +02:00
Shi Xiaowei	b1268e1b37	[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-02-06 07:15:18 -05:00
Yueh-Ting (eop) Chen	383c5921c2	[https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation (#10798 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2026-02-06 14:28:47 +08:00

1 2 3 4 5 ...

2833 Commits