Lizhi Zhou
93ae8a14ab
[ #10889 ][fix] fix pydantic deepcopy bug ( #11004 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
zhhuang-nv
ca9f70f78c
[ https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test ( #8683 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2026-01-27 15:22:21 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super ( #10754 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Lucas Liebenwein
00f341be49
[ #8982 ][feat] AutoDeploy attention dp support ( #10728 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV ( #10813 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
yingguo-trt
c8f1745a6e
[ https://nvbugs/5661741 ][feat] Add 250K-token NVFP4 MoE + PDL regression tests ( #10911 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-26 01:48:29 -05:00
dominicshanshan
c98c286c0f
[ https://nvbugs/5814203 ][fix] Fix port 8000 being used issue in stress test. ( #10756 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
bcd2dc490c
[None][test] Update case for release ( #10811 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
4ebc1b1596
[None][test] Update test case for release ( #10763 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
ruodil
4df0ca8bd1
[None][test] modify ctx config in 128k8k disagg cases ( #10779 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 ( #10736 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Kaiyu Xie
da967d0bd7
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances ( #10755 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-23 22:29:37 -05:00
Taylor Yeonbok Lee
1fbbb1f3cd
[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform ( #10772 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-23 15:22:54 -08:00
yuanjingx87
f4b52d3b78
[None][infra] Regenerate out dated lock file ( #10940 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-23 09:21:03 -08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements ( #10124 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Bo Deng
a218cf02fd
[ https://nvbugs/5768068 ][chore] improve disagg acc tests ( #10833 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-22 09:45:35 -05:00
tcherckez-nvidia
128d4ac5be
[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… ( #10803 )
...
Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
2026-01-22 13:08:05 +02:00
Wanli Jiang
ff0775408d
[None][fix] Fix waived tests for Nemotron-h models ( #10758 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-22 14:17:50 +08:00
Enwei Zhu
be4a431ffd
[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers ( #10890 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-22 14:14:28 +08:00
JennyLiu
415739711f
[None][chore] Add DGX-Spark VLM accuracy and perf spec dec cases ( #10804 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Signed-off-by: JennyLiu <141791095+JennyLiu-nv@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-22 12:38:17 +08:00
Daniil
0434db5bf7
[None][feat] GLM-4.5-Air support ( #10653 )
...
Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
2026-01-22 11:42:09 +08:00
kris1025
f91ea37a13
[None][chore] unwaive qwen3 235B accuracy test ( #10493 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2026-01-21 17:52:04 +08:00
Simeng Liu
3c8ed19440
[ https://nvbugs/5670108 ][fix] Fix overlap scheduler race condition in… ( #10610 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-01-20 10:56:56 -08:00
Gal Hubara-Agam
e61c942d1f
[ #10707 ][fix] AutoDeploy: Super accuracy test fixes ( #10717 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-01-20 18:16:13 +02:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion ( #10656 )
2026-01-20 12:31:17 +08:00
Shi Xiaowei
442d2e8a15
[None][test] adjust the dis-agg test timeout threshold ( #10800 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-19 17:02:00 +08:00
Eran Geva
32ab809f36
[ #10607 ][chore] Add Nemotron Nano v3 FP8 autodeploy perf test ( #10603 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster>
2026-01-19 08:48:07 +02:00
Zhanrui Sun
df845a028b
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab ( #10616 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-19 00:40:40 -05:00
chenfeiz0326
e97af45556
[TRTLLM-10300][feat] Upload regression info to artifactory ( #10599 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-19 10:16:31 +08:00
Chuang Zhu
4f04532ce7
[ https://nvbugs/5769890 ][fix] enable system memory to transfer active message in NIXL ucx ( #10602 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-19 09:20:12 +08:00
Lucas Liebenwein
b64052539d
[ https://nvbugs/5769712 ][fix] fix timeout in AutoDeploy llama accuracy test ( #10461 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-18 13:20:55 -05:00
chenfeiz0326
56073f501a
[TRTLLM-8263][feat] Add Aggregated Perf Tests ( #10598 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-17 13:16:36 +08:00
Chenghao Zhang
0b748d5bba
[None][chore] update flashinfer to 0.6.0 ( #10522 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 16:22:06 -05:00
Chenghao Zhang
b6acd96616
[None][fix] AutoDeploy: Fix the nvfp4 fused_moe ( #10727 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 12:04:40 -08:00
Tian Zheng
cfebfbb505
[ https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell ( #10490 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
Kaiyu Xie
4f86c5f5ce
[None] [feat] Support multiple accuracy tasks for slurm scripts ( #10500 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-16 15:50:32 +08:00
Chuang Zhu
7e2cbc0756
[ https://nvbugs/5598674 ][fix] enable partial reuse in gemma and gpt oss test ( #10559 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-16 10:26:15 +08:00
heyuhhh
e3f27e06c7
[None][chore] Waive star attention unittests ( #10439 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-16 10:12:32 +08:00
ruodil
22240e43eb
[None][test] store per user output and per gpu output metric in csv file ( #10658 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-15 00:51:08 -05:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias ( #10099 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Dom Brown
94c7b69048
[ https://nvbugs/5630196 ] [fix] Prevent flaky failures in C++ test_e2e.py by using local cached datasets for benchmarking ( #10638 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2026-01-14 21:39:55 -05:00
Wanli Jiang
73d1840c12
[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model ( #10482 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-15 10:07:02 +08:00
dominicshanshan
0f2d61b8c6
[ https://nvbugs/5766952 ][fix] Fix AIPerf issue. ( #10666 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-15 09:54:34 +08:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel ( #9905 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Bo Li
582dec5bb5
[ https://nvbugs/5774869 ][infra] Use 2 GPUs to test skip softmax attention on H100. ( #10420 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-14 07:03:01 -05:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support ( #10532 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API ( #9715 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
xinhe-nv
07d9390e9b
[None][test] add test into qa test list ( #10627 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-13 22:43:00 -05:00
Balaram Buddharaju
ccdfa43a6e
[ https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP ( #10533 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-13 15:48:42 -05:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. ( #10347 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00