Lizhi Zhou
6837e73219
[ https://nvbugs/5847284 ][fix] fix cuda oom error ( #11219 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-13 19:04:33 +08:00
JennyLiu
11d79aa875
[ https://nvbugs/5832481 ][test] Add gpt-oss-120b-Eagle3-throughput case on DGX-Spark ( #11419 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-02-12 05:33:39 -05:00
peihengh
a982554190
[ https://nvbugs/5868038 ][fix] Gracefully terminate disagg serving servers to prevent leftover subprocess warnings ( #11395 )
...
Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>
2026-02-10 22:41:37 -05:00
Iman Tabrizian
7d992972b2
[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ ( #10540 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-10 07:20:56 -08:00
Yiqing Yan
cf02456613
[TRTLLM-9711][infra] Fix the testcase name in timeout xml ( #9781 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-02-10 18:50:42 +08:00
Lucas Liebenwein
a2fb5afecf
[ #11032 ][feat] MLA revisited and GLM 4.7 Flash support ( #11324 )
2026-02-09 23:26:51 -05:00
JennyLiu
b5508ed75b
[None][test] Add DGX-Spark multinode perf cases including eagle3 ( #11184 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-02-10 10:44:41 +08:00
Lucas Liebenwein
fe4c690b6c
[ https://nvbugs/5855540 ][fix] AutoDeploy: thread cleanup of eagle test ( #11289 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-09 18:01:12 -05:00
Lizhi Zhou
e719721a60
[TRTLLM-10866][feat] implement disaggregated harmony chat ( #11336 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 12:09:03 -05:00
Ivy Zhang
9384cf8458
[ https://nvbugs/5839569 ][test] update test constraint ( #11054 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Lizhi Zhou
1524c172a4
[ https://nvbugs/5821433 ][fix] WAR for popen in QA env ( #10989 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
yingguo-trt
d348dd95a7
[None][feat] support Lyris GB200 and increase disagg test timeout ( #11019 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Robin Kobus
31db399042
[ https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver ( #11354 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:11:45 +08:00
Yihan Wang
635d65f9fe
[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch ( #11168 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-09 13:57:57 +08:00
Iman Tabrizian
18e611da77
[ https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg ( #11247 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
Gal Hubara-Agam
f9eed3ecc2
[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds ( #11107 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-06 14:55:18 +02:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell ( #10130 )
...
Added FP8 cute dsl gemm and batch gemm.
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00
nvyocox
e52eb82780
[ #11234 ][test] Move test_ad_export_onnx to integration examples ( #11260 )
...
Signed-off-by: yocox <yocox@nvidia.com>
2026-02-05 11:32:57 -05:00
chenfeiz0326
eae480b713
[ https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope ( #11259 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-05 23:12:38 +08:00
Yuewei Na
0d18b2d7a4
[None][feat] Add priority-based KV cache offload filtering support ( #10751 )
...
Signed-off-by: Yuewei Na <yna@nvidia.com>
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
2026-02-05 05:22:56 -05:00
Simeng Liu
d9fd8cc951
[ https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse ( #10875 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-04 12:46:31 -05:00
Lucas Liebenwein
925d911fc0
[ #10966 ][feat] AutoDeploy: kv cache manager integration [2/2] ( #11149 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-04 09:44:27 -05:00
Gal Hubara-Agam
de6931bbfd
[None][fix] Fix selective_state_update perf regression for T=1 decode path ( #11194 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-04 09:01:34 +02:00
chenfeiz0326
04b7db3ab5
[TRTLLM-8263][feat] Add Disagg Perf Tests ( #10912 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-04 10:16:11 +08:00
Chenjie Luo
2532eb5adc
[None][fix] Align kv_scales with modelopt HF checkpoint ( #10745 )
...
Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
2026-02-03 08:03:42 -05:00
Taylor Yeonbok Lee
304dc6f3c0
[None][chore] Print memory usage before/after accuracy test in CI ( #11155 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-03 00:23:14 -05:00
gramnarayan
585fbb2734
[ #10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation ( #11073 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-02-02 09:51:10 -08:00
Lizhi Zhou
4d282bd7c1
[ https://nvbugs/5821433 ][fix] fix test_auto_scaling for 2 GPUs ( #10866 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
JunyiXu-nv
2a5b8800e1
[ https://nvbugs/5754977 ][fix] Use free port for serve test ( #10878 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yi Zhang
0306c0f12c
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime ( #10659 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-02 14:29:02 +08:00
Lizhi Zhou
b00e8338ec
[ https://nvbugs/5834212 ][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID ( #11095 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-02 09:54:33 +08:00
Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super ( #10791 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Enwei Zhu
5ff244ce54
[ https://nvbugs/5837281 ][fix] Fix trtllm-serve guided decoding test ( #11101 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-30 16:59:55 +08:00
JennyLiu
6506d63466
[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases ( #11096 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-30 00:38:19 -05:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup ( #10867 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Mike Iovine
0ad87895f5
[ https://nvbugs/5836592 ][fix] Fix qwen3 eagle test ( #11030 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-29 14:49:08 -08:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP ( #10477 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Anish Shanbhag
24ac86c485
[ https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency ( #10471 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint ( #10674 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00
yingguo-trt
e70a55bd94
[None][feat] support multi_acc and Lyris GB200 test ( #11024 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-28 06:01:48 -05:00
Grzegorz Kwasniewski
38bcee189c
[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests ( #10364 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-28 10:34:10 +01:00
Lucas Liebenwein
ff3a494f5c
[ #10013 ][feat] AutoDeploy: native cache manager integration ( #10635 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Lizhi Zhou
93ae8a14ab
[ #10889 ][fix] fix pydantic deepcopy bug ( #11004 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
zhhuang-nv
ca9f70f78c
[ https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test ( #8683 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2026-01-27 15:22:21 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super ( #10754 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Lucas Liebenwein
00f341be49
[ #8982 ][feat] AutoDeploy attention dp support ( #10728 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV ( #10813 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
yingguo-trt
c8f1745a6e
[ https://nvbugs/5661741 ][feat] Add 250K-token NVFP4 MoE + PDL regression tests ( #10911 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-26 01:48:29 -05:00
dominicshanshan
c98c286c0f
[ https://nvbugs/5814203 ][fix] Fix port 8000 being used issue in stress test. ( #10756 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
bcd2dc490c
[None][test] Update case for release ( #10811 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00