Commit Graph

4812 Commits

Author SHA1 Message Date
Emma Qiao
af49fbdf65 [None][infra] Waive failed case for release branch on 01/19 (#10795)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yukun He
25bdc30162 [https://nvbugs/5782112][fix] Cherry-pick #10633: Fix hanging issue for MNNVL Allreduce under PP (#10750)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yuxian Qiu
2b3bb2e9b0 [https://nvbugs/5811697][fix] Fix buffer reuse. (#10716)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
4b833492fb [None][infra] Waive failed cases for release on 10/18 (#10781)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Faraz
aa410c57bc [TRTLLM-5366][chore] Add dgx-spark beta notes (#10766)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Mike Iovine
f02948d956 [https://nvbugs/5803813][fix] Fix llama 4 min latency (#10724)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Patrice Castonguay
93e7ae73ea [None][doc] 1.2 Release Notes Headers (#10722)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
TensorRT LLM
0c393ebc69 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-25 03:04:42 +00:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler (#10941)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Yuxian Qiu
9fcc93ea7b
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-24 14:04:10 +08:00
Emma Qiao
9d65b8bf24
[None][infra] Fix TRT-LLM data scratch mount point for gb10x (#10880)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 14:00:17 +08:00
Yanchao Lu
78a008d61a
[None][ci] Remove long-running sanity check tests on GH200 (#10924) (#10969)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 13:06:28 +08:00
Kaiyu Xie
da967d0bd7
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-23 22:29:37 -05:00
TensorRT LLM
58dc4bea9c [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-24 03:03:08 +00:00
jthomson04
cf88da7eca
[None][feat] KV Connector Support for MTP (#10932)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-01-23 18:58:26 -05:00
Taylor Yeonbok Lee
1fbbb1f3cd
[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-23 15:22:54 -08:00
Jin Li
b560598c79
[https://nvbugs/5707359][fix] Unwaive the test that due to flashinfer… (#10570)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-23 13:09:04 -05:00
yuanjingx87
f4b52d3b78
[None][infra] Regenerate out dated lock file (#10940)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-23 09:21:03 -08:00
Yihan Wang
1d68fab49c
[https://nvbugs/5814215][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py::test_flashinfer_fused_moe_matches_torch_moe (#10930)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-01-24 01:09:18 +08:00
Yan Chunwei
54768f3f2c
[None][chore] refine placement group in ray executor (#10235)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-01-23 19:31:20 +08:00
Yihan Wang
43f2b51e94
[https://nvbugs/5833795][chore] Waive test test_e2e.py::test_ptp_quickstart_advanced[GPT-OSS-120B-gpt_oss/gpt-oss-120b] (#10953)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-01-23 06:04:57 -05:00
Emma Qiao
ae114ec7cf
[None][infra] Waive a failed case in pre-merge stage (#10948)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-23 04:40:17 -05:00
zackyoray
51c7a06da6
[None][feat] Upgrade NIXL to v0.9.0 (#10896)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2026-01-23 15:58:53 +08:00
Stanley Sun
0f7192c7fe
[None][test] Remove unused test list (#10916)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2026-01-23 10:24:06 +08:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
yuanjingx87
ea928f62af
[None][infra] Update CI allowlist (#10936)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-22 14:22:27 -08:00
Lucas Liebenwein
d793bd973d
[https://nvbugs/5688721][fix] unwaive NemotronH accuracy test (#10852)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-22 16:23:28 -05:00
William Zhang
2146c23786
[#9306][refactor] Refactor AutoDeployConfig into LlmArgs (#10613)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski
d8e6e22060
[https://nvbugs/5819002][fix] fix sharding tests (#10775)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-22 20:02:48 +01:00
Yi Zhang
d43be7b65e
[None][fix] Avoid Double update for previous batch (#9888)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-01-22 13:15:06 -05:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Shi Xiaowei
9adef4eb28
[TRTLLM-9527][doc] Add NIXL as a Python attribution (step 4) (#10910)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:09:55 -08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yan Chunwei
30ffa58b54
[https://nvbugs/5783876][fix] fix hmac launch (#10434)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
Bo Deng
a218cf02fd
[https://nvbugs/5768068][chore] improve disagg acc tests (#10833)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-22 09:45:35 -05:00
Pengyun Lin
5e34112b27
[TRTLLM-10388][feat] Support logprobs for Completions API (#10809)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-22 21:25:24 +08:00
彭晋韬(jtao peng)
9beb971827
[None][fix] Update RMSNorm custom op plumbing (#10843)
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-22 21:03:22 +08:00
Jiayu Chang
1dc49b266e
[https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
Yihan Wang
cdb9ffd0ab
[https://nvbugs/5741304][chore] Update flashinfer-python to 0.6.1 (#10872)
Signed-off-by: Yihan Wang
2026-01-22 19:29:16 +08:00
tcherckez-nvidia
128d4ac5be
[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803)
Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
2026-01-22 13:08:05 +02:00
Yiqing Yan
0243abee22
[None][chore] Bump version to 1.3.0rc1 (#10923)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-22 18:45:40 +08:00
Enwei Zhu
0b3092e144
[None][ci] Fix test list llm_spark_func.txt (#10921)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-22 04:23:03 -05:00
tcherckez-nvidia
6e72aff866
[#10838][fix] Add missing dist strategy param. fix typo for ad_logger… (#10892)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-22 10:38:31 +02:00
Bo Li
9ce0511d86
[https://nvbugs/5811159][fix] Unwaive bug 5811159. (#10903)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-22 16:28:11 +08:00
Pengbo Wang
9462d90ec7
[None][feat] Add KV cache cleanup (#7439)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-22 15:14:17 +08:00
shuyixiong
fd2af8d58a
[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-22 14:46:05 +08:00
Wanli Jiang
ff0775408d
[None][fix] Fix waived tests for Nemotron-h models (#10758)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-22 14:17:50 +08:00
Enwei Zhu
be4a431ffd
[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee
895bb94b3d
[#8241][feat] Support model_kwargs for pytorch backend (#10351)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-21 20:51:38 -08:00