Commit Graph

5061 Commits

Author SHA1 Message Date
Yiqing Yan
21cdc39e83
[TRTLLM-10331][infra] Upload unittest sub results in slurm (#10834)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-10 17:53:35 +08:00
dominicshanshan
2a4e70b4a9
[None][chore] Unwaive tests after last MI (#11400)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-10 17:12:39 +08:00
Emma Qiao
8a74ccc57e
[None][infra] Waive failed cases for main branch on 02/10 (#11413)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-10 03:21:59 -05:00
Yuxian Qiu
5f4df89109
[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 15:43:28 +08:00
Lizhi Zhou
c233692485
[None][doc] add multiple-instances section in disaggregated serving doc (#11412)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-10 02:31:45 -05:00
Emma Qiao
17cc1c13d6
[None][infra] Enable sparck ci since spark cloud migration is done (#11407)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-10 01:47:22 -05:00
shuyixiong
c3cdc93211
[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2026-02-10 01:12:49 -05:00
Jonas Li
8b2dc57823
[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch (#11384)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2026-02-10 14:00:42 +08:00
Venky
0c8b5221b4
[TRTC-264][doc] Add CLAUDE.md and AGENTS.md (#11358)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-09 20:29:58 -08:00
Lucas Liebenwein
a2fb5afecf
[#11032][feat] MLA revisited and GLM 4.7 Flash support (#11324) 2026-02-09 23:26:51 -05:00
Venky
d50f010fa9
[TRTC-265][chore] Add CODEOWNERS coverage for serve/ and commands/ directories (#11359)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-09 22:52:09 -05:00
Emma Qiao
85919d9517
[None][infra] Disable spark stages due to migration of spark cloud (#11401)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-09 22:31:09 -05:00
Yuan Tong
4fc3644705
[None][fix] Avoid reserved filename on Windows (#11382)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-02-10 11:22:59 +08:00
JennyLiu
b5508ed75b
[None][test] Add DGX-Spark multinode perf cases including eagle3 (#11184)
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-02-10 10:44:41 +08:00
Mike Iovine
f33086914f
[https://nvbugs/5843112][chore] Unwaive ngram test (#11320)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-02-09 21:31:29 -05:00
Yuxian Qiu
af68c29d3d
[None][chore] Reduce attention module repeated warnings. (#11335)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 08:58:21 +08:00
Lucas Liebenwein
fe4c690b6c
[https://nvbugs/5855540][fix] AutoDeploy: thread cleanup of eagle test (#11289)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-09 18:01:12 -05:00
Ziyi Xiong
e76b634251
[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec (#10502)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-02-10 05:16:02 +08:00
Mike Iovine
092f4ce774
[https://nvbugs/5853997][chore] Unwaive gpt-oss test (#11287)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-02-09 16:04:41 -05:00
Patrice Castonguay
c68d916b6f
[None][chore] Unit test for disagg gen cancellation (#11108)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-02-09 14:39:02 -05:00
tcherckez-nvidia
ea81a03dd1
[None][chore] update model list (#11364)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-02-09 21:27:39 +02:00
Bala Marimuthu
4a743338c3
[None][infra] AutoDeploy: Dump graph IR after every transform (#11045)
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-02-09 10:43:44 -08:00
Lizhi Zhou
e719721a60
[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 12:09:03 -05:00
Harris Nover
100bfdc516
[None][fix] Respect CUDA_LAUNCH_BLOCKING by fixing doCheckError (#11261)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-09 11:49:56 -05:00
Guiju Zhang
c37531c3f7 [TRTLLM-10669][fix] Fix Eagle3 draft model weight loading for throughput checkpoint (#11010)
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Ivy Zhang
9384cf8458 [https://nvbugs/5839569][test] update test constraint (#11054)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Emma Qiao
03b635bb08 [None][infra] Waive failed case for release on 1/28 (#11055)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Lizhi Zhou
1524c172a4 [https://nvbugs/5821433][fix] WAR for popen in QA env (#10989)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Balaram Buddharaju
5f8b1b8cbb [https://nvbugs/5811087][chore] Unwaive Gemma3 27B multimodal test (#11049)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Enwei Zhu
1ba039f044 [https://nvbugs/5819452][ci] Unwaive LLaMA2 7B FP8 case (#10997)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
William Zhang
abb8106c01 [https://nvbugs/5835925][fix] Add EPD disagg support for Qwen3 VL MoE (#10962)
* Why?

Trying to instantiate a `MultimodalEncoder` for a Qwen3 VL MoE model
would fail during weight loading.

* What?

This commit fixes the bug, alongside:
- explicit, intentional support for EPD for Qwen3 VL MoE.
- extends EPD unit tests for Qwen3 VL MoE, albeit with dummy weights.
- unit tests for the weight mapper fixes.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Jin Li
0ead17bb85 [https://nvbugs/5800646][fix] Fix hang issue by avoid exposing UB buf… (#10842)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
yingguo-trt
d348dd95a7 [None][feat] support Lyris GB200 and increase disagg test timeout (#11019)
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
yufeiwu-nv
fd4e6132e5 [None][test] Fix missing test cases (#10881)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Stefan Niebler
d50010cd1f [https://nvbugs/5769815][fix] Fix offset calculation in _are_stop_words when using speculative decoding (#10854)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Lizhi Zhou
6c4e0c3dbe [https://nvbugs/5826689][fix] replace etcd3 with etcd-sdk-python (#10886)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Emma Qiao
c659280445 [None][infra] Waive failed cases for release branch on 01/26 (#10999)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Pengbo Wang
59f59efb83 [https://nvbugs/5779536][fix] Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
JunyiXu-nv
90ea6c1e09 [https://nvbugs/5804146][fix] Enable responses tests and remove ds to… (#10925)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
mpikulski
196d94a419
[TRTLLM-10030][perf] avoid syncs in beam search + other improvements (#11349)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-09 16:13:58 +01:00
Gal Hubara-Agam
2b60cc181c
[#10780][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE (#11322)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-09 10:07:37 -05:00
Lizhi Zhou
540fb0f29e
[https://nvbugs/5834212][chore] unwaive test_disaggregated_mixed (#11372)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 09:16:25 -05:00
Robin Kobus
b3e4ddc953
[None][test] Enhance multi-GPU tests for IFB stats (#11239)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:25:32 +08:00
Robin Kobus
31db399042
[https://nvbugs/5829097][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver (#11354)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:11:45 +08:00
Bo Li
ab73f6ebc6
[None][chore] Add microbench for MoE Comm methods. (#10317)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-09 02:57:01 -05:00
Yihan Wang
635d65f9fe
[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch (#11168)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-09 13:57:57 +08:00
Emma Qiao
ad8f6748a3
[None][infra] Waive failed case for main branch on 02/09 (#11369)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-08 23:05:33 -05:00
TensorRT LLM
fe9192f120 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-09 03:16:42 +00:00
Yanchao Lu
b464c75056
[None][ci] Waive test failures on main 02/08 (#11365)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-08 22:50:37 +08:00
TensorRT LLM
f7cf25748b [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-08 03:10:28 +00:00