Commit Graph

2511 Commits

Author SHA1 Message Date
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873)
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Jie Li
6fcd4e7099
[None][chore] Add failed cases into waives.txt (#10541)
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-09 01:03:47 -05:00
ruodil
d707286ca8
[None][test] restrict max_num_tokens in disagg mtp config (#10442)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-08 21:53:24 -05:00
Balaram Buddharaju
56e779d09f
[None][chore] Waive tests blocking premerge 01/08 (#10555)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-08 20:22:28 -05:00
Mike Iovine
4092a87b6f
[https://nvbugs/5740075][fix] Fix sm120 speculation (#10049)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL (#10470)
* Why?

We would like to support EPD disaggregated serving for Qwen3 VL.

* What?

This commit adds such support, and extends existing unit tests for
correctness checks.

Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Emma Qiao
43839c7d9b
[TRTLLM-9642][infra] Increase pytest verbosity for failed tests (#9657)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2026-01-08 02:33:48 -05:00
HuiGao-NV
22c81cb5fa
[None][chore] Enable seg fault cases since one race condition is fixed (#10398)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-08 02:15:30 -05:00
Barry Kang
f57aab5255
[https://nvbugs/5775402][fix] Fix concurrency list in Wide-EP perf tests (#10529)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2026-01-08 01:58:55 -05:00
Lucas Liebenwein
30f8455d29
[https://nvbugs/5747878][fix] unwaive llama4 scout tests (#10468)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 23:33:45 -05:00
yingguo-trt
f8b2a8fd30
[None][chore] Support multiple job submission at the same time (#10492)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Co-authored-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-07 21:51:36 -05:00
Yuxian Qiu
b85c447ceb
[https://nvbugs/5784543][fix] Setup dist before using autotuner. (#10491)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 10:32:50 +08:00
xxi
81f878c279
[https://nvbugs/5707392][fix] unwaive test_fused_moe_fp8_blockwise_wide_ep[NotEnabled] (#10428)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-08 09:17:59 +08:00
Lucas Liebenwein
d736c7f290
[https://nvbugs/5761665][fix] AutoDeploy: handle bugs for 25.12 dlfw upgrade (#10511)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 20:16:53 -05:00
yufeiwu-nv
b130d58c88
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml (#10487)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 17:18:43 +08:00
xinhe-nv
872210468b
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10474)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 03:23:43 -05:00
yingguo-trt
cbf8357e5f
[https://nvbugs/5726086][fix] update kimi-k2-1k1k dataset (#10473)
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-07 01:24:08 -05:00
xinhe-nv
be5579633e
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10457)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 00:57:03 -05:00
Fanrong Li
a34aa63685
[https://nvbugs/5767223][feat] add pp support for DeepSeek-v3.2 (#10449)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
xinhe-nv
1fbadd2dde
[None][chore] Add failed cases into waives.txt (#10365)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-01-06 22:08:06 -05:00
Ivy Zhang
4a1b2e23b3
[https://nvbugs/5698434][test] add qwen3-4b accuracy test case (#10382)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 21:56:34 -05:00
Lucas Liebenwein
6095c80e56
[https://nvbugs/5721907][fix] AutoDeploy: improve numerical stability of flashinfer attention test (#10467)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 21:11:06 -05:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel (#10429)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[https://nvbugs/5732942][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes (#10466)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00
Mike Iovine
77be1b7572
[https://nvbugs/5749988][fix] Remove redundant qwen3 spec dec test (#10387)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-06 11:46:34 -05:00
Enwei Zhu
037753f65b
[https://nvbugs/5748600][ci] Unwaive disagg guided decoding test (#10409)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-06 11:38:12 -05:00
JunyiXu-nv
7d62773c6c
[https://nvbugs/5760726][fix] Use random port in container port section (#10432)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-06 23:25:46 +08:00
xinhe-nv
704f58dfbe
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10427)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 04:47:54 -05:00
Emma Qiao
6507087c3f
[None][infra] Waive failed cases on 1/6 (#10440)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-06 16:54:54 +08:00
Bo Li
df0b976b99
[https://nvbugs/5785206][infra] Waive TestQwen3_30B_A3B::test_fp8[latency-torch_compile=False]. (#10441)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-06 03:32:19 -05:00
William Zhang
ab58d7cac1
[https://nvbugs/5772361][ci] Unwaive tests that have been fixed (#10424)
These tests were all failing due to the same issue, and were fixed
in #10394.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-05 23:49:54 -08:00
Ivy Zhang
1e828587e5
[TRTLLM-9896][test] add vswa test cases coverage (#10146)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 02:02:29 -05:00
Yiqing Yan
5108a69fc0
[TRTLLM-9622][infra] Enable DGX_B300 multi-gpu testing in pre-merge pipeline (#9699)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-06 14:39:55 +08:00
xinhe-nv
998527724c
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10367)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 01:09:21 -05:00
Ivy Zhang
22a1d31a27
[None][test] update test case constraint (#10381)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 12:28:59 +08:00
xinhe-nv
1b1058279c
[TRTLLM-8638][fix] Add failed cases into waives.txt (#10384)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 23:02:27 -05:00
kris1025
3e98265682
[None][chore] unwaive qwen3 30b test (#10115)
Signed-off-by: linquanh <linquanh@nvidia.com>
2026-01-06 11:17:08 +08:00
alel
6b8ae6fa81
[None][feat] CuteDSL MOE FC1 Enhancement (#10088)
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2026-01-06 09:30:43 +08:00
chenfeiz0326
8a04c05079
[None][fix] Only Use Throughput Metrics to Check Regression (#10404)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-06 09:21:15 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) (#10113)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Simeng Liu
3b56548fcf
[https://nvbugs/5777044][chore] Remove solved bugs from waives.txt (#10422)
Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
2026-01-05 16:56:58 -05:00
Mike Iovine
91ff46d418
[https://nvbugs/5745152][fix] Unwaive gpt oss spec decode test (#10370)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:06:58 -05:00
Mike Iovine
7a2dab8e85
[https://nvbugs/5695984][fix] Unwaive llama3 eagle test (#10092)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:03:35 -05:00
Yan Chunwei
6b71b03947
[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution (#10400)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-05 13:58:03 -05:00
Mike Iovine
db2614ef10
[https://nvbugs/5772414][fix] Fix draft token tree depth=1 corner case (#10385)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
xinhe-nv
b1733d56f6
[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests (#10357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 05:15:52 -05:00