Commit Graph

1715 Commits

Author SHA1 Message Date
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks (#8777)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00
Emma Qiao
a5cc9fe0aa
[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. (#6256)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-30 01:56:04 -07:00
xinhe-nv
a4f75399b9
[https://nvbugs/5481206][fix] update waives (#8774)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-30 00:43:38 -07:00
Emma Qiao
7d3cebf34e
[None][infra] Unwaive the tests passed in latest CI and disable a perf stage (#8775)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 12:48:23 +08:00
Emma Qiao
db99a936b0
[TRTLLM-8971][infra] Update gpu key for B300/GB300 (#8724)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-29 20:36:44 -07:00
Yuxian Qiu
3176bd3815
[None][fix] Fix UnboundLocalError. (#8756)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-29 19:41:37 -07:00
HuiGao-NV
ae57738bae
[https://nvbugs/5547414][fix] Use cached models (#8755)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-29 19:10:10 -07:00
Iman Tabrizian
ae6875fe10
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-10-29 08:04:26 -07:00
Emma Qiao
579e1067bf
[None][infra] Waive failed tests on main (#8759)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-29 21:32:23 +08:00
Yan Chunwei
fc3b6f5331
[None][ci] waive test_rpc.py (#8745)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-29 05:17:40 -07:00
Chang Liu
81eb861df0
[None][chore] Enable GPQA in CI for DeepSeek V3.2 (#8712)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-29 04:22:22 -07:00
Zheng Duan
d626d13d37
[https://nvbugs/5607238][test] fix working dir in disagg worker test (#8648)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-29 16:13:52 +08:00
Pengyun Lin
2aade46d18
[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-29 15:48:29 +08:00
xinhe-nv
7ba98a6b20
[None][chore] Add failed cases into waives.txt (#8684)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-28 20:30:01 -07:00
Yan Chunwei
f2faf2809f
[None][ci] waive test_rpc.py temporarily (#8743)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-28 19:20:27 -07:00
Zheng Duan
fea5bfbda7
[None][feat] add detailed KV cache transfer time breakdown (#8521)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-29 10:11:09 +08:00
ruodil
f444fe2deb
[None][test] fix a typo in perf test sampler config (#8726)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-29 09:53:53 +08:00
Lizhi Zhou
24167d00eb
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-28 17:04:53 -07:00
dongfengy
083f3637f1
[https://nvbugs/5596343][test] Update test waive to get back some coverage (#8702)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-28 14:05:48 -07:00
Anish Shanbhag
a09b38a862
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-28 09:17:26 -07:00
dongfengy
5a01f382c1
[https://nvbugs/5575913][fix] Use separate thresholds for 120b/20b gptoss (#8664)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-28 10:35:07 -04:00
Robin Kobus
e8e2b0697a
[None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test (#8523) (#8725)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-28 14:23:38 +01:00
ruodil
6b9b73ee27
[https://nvbugs/5564465][test] ensure deepseek_v3_lite isl + osl < max_seq_len (#8565)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-28 15:25:52 +08:00
ruodil
bf72eb045e
[TRTLLM-7835][test] add default sample config for perf test (#8523)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-28 02:22:47 -04:00
yufeiwu-nv
0e36484fba
[None][test] Add gpt_oss_20b Model to Sanity Perf Test (#8265) 2025-10-28 13:36:28 +08:00
Aurelien Chartier
0a02f5f25d
[None][chore] Use a cached model path for Ray integration test (#8660)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 19:16:06 -07:00
HuiGao-NV
49974eed75
[None][chore] ISOLATE some cases (#8690)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-27 22:10:44 -04:00
chenfeiz0326
f5265a087b
[None][infra] Minor Update on Perf Sanity Testdb Files (#8607)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-28 09:54:48 +08:00
gramnarayan
88b0fbc8ff
[#8245][feat] Autodeploy: Guided Decoding Support (#8551)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
Yechan Kim
a6017f6266
[https://nvbugs/5608723][fix] Use local data on multimodal tests and unwaive tests (#8673)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-28 09:20:02 +09:00
Emma Qiao
73a5479b26
[None][infra] Skip failed tests for main 10/27 (#8686)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-28 08:04:30 +08:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests (#8628)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
Kaiyu Xie
c9b08790c2
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601) 2025-10-27 21:39:44 +08:00
Jie Li
ce0d76135d
[https://nvbugs/5546507][fix] skip TRT-Flow test case due to CMake Error in building (#8677)
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-10-27 05:11:47 -04:00
xinhe-nv
8090c9641c
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8672)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 03:20:46 -04:00
xinhe-nv
0ac5cbcac4
[None][chore] Add failed cases into waives.txt (#8669)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 02:36:28 -04:00
QI JUN
cc5b8b6d28
[None][ci] move some time-consuming benchmark test cases to post merge (#8641)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-26 22:47:17 -04:00
Emma Qiao
e0728ba8a7
[None][infra] Waive failed case on main 10/26 (#8668)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-26 22:02:32 +08:00
Chenghao Zhang
a6d20f6f9b
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-25 15:26:45 -04:00
Simeng Liu
2b27810198
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-24 19:09:07 -07:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker (#8246)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
Emma Qiao
35e35db422
[None][infra] Waive tests on main and remove lines which missed in MI (#8639)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-24 02:49:23 -04:00
xinhe-nv
2aaedd08cd
[TRTLLM-8638][fix] fix test issues (#8557)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:16:55 -04:00
xinhe-nv
9a9d647292
[None][chore] Add failed cases into waives.txt (#8630)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:11:03 -04:00
ruodil
07a957e5cb
[None][test] remove redunctant runtime backend in perf test (#8358)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-24 02:01:34 -04:00
Stanley Sun
6b793d5c3d
[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests (#8580)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-10-24 13:23:47 +08:00
xinhe-nv
59375e8bed
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8590)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 00:02:42 -04:00
xinhe-nv
95d39e6e76
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8588)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 23:08:52 -04:00
xinhe-nv
04e2b2752a
[None][feat] add Nemotron-Ultra multi nodes eval tests (#8577)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 02:44:26 -04:00
Anthony Chang
8a3b870e09
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-23 09:14:18 +08:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
xinhe-nv
b8b2c9efb4
[None][chore] add precommit hook to remove redundant tab and white space (#8534)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 09:21:54 -04:00
Eran Geva
d4b3bae5af
[#8391][fix] check perf by device subtype (#8428)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-22 12:38:05 +03:00
Ivy Zhang
912cf4f603
[TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations (#8518)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-22 04:36:47 -04:00
Emma Qiao
92e99b6545
[None][infra] Waive failed cases for main branch 10/22 (#8573)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-22 04:21:56 -04:00
Shi Xiaowei
77940635bb
[https://nvbugs/5451272][fix] unwaive the test (#8537)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 14:28:42 +08:00
xinhe-nv
187cf12d8f
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8554)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 01:26:15 -04:00
Emma Qiao
2b4e812aea
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging (#8471)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-22 00:07:35 -04:00
chenfeiz0326
6cf1c3fba4
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-22 10:17:22 +08:00
sunnyqgg
90080e0e09
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-22 09:58:22 +08:00
Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469) 2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Emma Qiao
653aa6b6dc
[None][infra] Waive failed tests for main 10/21 (#8524)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 06:24:15 -04:00
xinhe-nv
c566890624
[TRTLLM-8638][fix] Remove closed bugs (#8478)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 03:48:58 -04:00
xinhe-nv
3264d605fb
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8486)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 01:20:29 -04:00
ruodil
ab4b9966b2
[TRTLLM-7287][test] add multimodal chunked_prefill cases (#8011)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-10-20 22:43:47 -04:00
Suyog Gupta
7050b1ea49
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
dongfengy
9b289d5230
[https://nvbugs/5568676][fix] Remove test waive (#8437)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-20 12:03:50 -07:00
HuiGao-NV
d0663e16e0
[https://nvbugs/5492250][fix] Remove isolated cases and unwaive cases (#8492)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-20 07:40:07 -04:00
Pamela Peng
b818a912d7
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
QI JUN
d05079ba4b
[None][ci] move some test cases from H100 to A10 (#8449)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-20 01:58:34 -04:00
xiweny
f7722e2b65
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt (#8359)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-20 09:52:51 +08:00
xinhe-nv
9aa086d3bb
[None][chore] update test duration (#8377)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-19 20:45:51 -04:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
Emma Qiao
e185173240
[None][infra] Waive test for main branch on 10/18 (#8472)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 04:36:42 -04:00
brb-nv
7cc65a6296
[None][chore] Waive failing transceiver test (#8473)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-18 17:22:10 -04:00
Lucas Liebenwein
41169fb20c
[None][feat] AutoDeploy: chunked prefill support (#8158)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-18 00:47:35 -07:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs (#8039)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
xinhe-nv
bc833d3de3
[TRTLLM-8638][fix] add waives tests (#8445)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-17 03:37:53 -07:00
zhhuang-nv
7a2bab93f0
[None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-10-17 02:30:33 -07:00
yufeiwu-nv
1e1f430163
[None][test] Filter out all fp8 test case for A100. (#8420)
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-10-16 20:42:50 -07:00
Ivy Zhang
70a0f5beb6
[TRTLLM-8580][test] save runtime report periodically (#8312)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-17 10:47:26 +08:00
John Calderon
46ee7acb33
[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling (#7539)
Signed-off-by: John Calderon <johncalesp@gmail.com>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Signed-off-by: john calderon <jcalderon@nvidia.com>
Signed-off-by: John Calderon <jcalderon@nvidia>
2025-10-16 17:49:22 +02:00
Yiqing Yan
05dd437084 [https://nvbugs/5565541][fix] Add timeout threshold for H100 FHMA test (#8354)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
bhsueh_NV
69325e1aa3 [https://nvbugs/5574556][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Lizhi Zhou
982d4b65e8 [https://nvbugs/5550671][fix] fix disagg-serving multinodes test failure (#8307)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Chuang Zhu
18a534d2b4 [https://nvbugs/5465642][fix] Increase server timeout to wait weight loading (#8297)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Enwei Zhu
526cad37d7 [https://nvbugs/5568951][fix] Fix guided decoding disagg tests (#8311)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
1b559ba91d [None][chore] Update test configs for release (#8224)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
4789c1e588 [TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
be2ab98233 [None][chore] Update constaintfor release (#8211)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yukun He
179c7dc501 [https://nvbugs/5536131][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
xinhe-nv
f70eff30b3
[TRTLLM-8638][fix] waive llam4 tests on H20 (#8416)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-16 03:14:56 -07:00
HuiGao-NV
4e6a492aa3
[None][chore] Isolate several intermittent cases (#8408)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-15 23:48:31 -07:00
xiweny
4143887370
[https://nvbugs/5541494] [fix] Remove waivers (#8353)
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-15 19:10:35 -07:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window (#8320)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings (#6980)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py (#8317)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
QI JUN
1a1c9a29ab
[None][ci] move all llama4 test cases to post merge (#8387)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 16:36:37 +08:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' (#8263)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
Jin Li
206a9930df
[https://nvbugs/5547435][fix] Fix a merge conflict (#8365)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-15 10:43:10 +08:00
Emma Qiao
493da020c1
[TRTLLM-7351][infra] Add isolate marker for L0 (#7497)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-14 16:58:14 -07:00
dongfengy
9d855f47ad
[None][fix] Remove outdated test waives for GPTOSS (#8183)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-14 16:20:38 -07:00
Michal Guzek
1cdb0b62c3
[https://nvbugs/5563469][fix] Temporarily disable test_nemotron_nano_8b_lora_torch in L0 due to Torch non-determinism (#8206)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-14 17:55:28 +02:00
William Zhang
72d65d079a
[https://nvbugs/5542878][fix] Unwaive test (#8027)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-10-14 07:58:07 +02:00
xinhe-nv
371fcb0338
[TRTLLM-8366][feat] add kimi multi nodes case (#8025)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 21:36:03 -07:00
Yuxian Qiu
3450fe9944
[None][fix] Fix dummy load format for key models. (#7993)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-14 11:18:39 +08:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs (#7206)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
xinhe-nv
9fe63dd8db
[None][chore] Add failed cases into waives.txt (#8290)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 00:07:00 -07:00
xinhe-nv
72fcff1044
[None][fix] add timeout for llama4 (#8254)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-12 21:04:20 -07:00
Guoming Zhang
989c25fcba
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. (#8288)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 10:25:45 +08:00
Emma Qiao
fdbeea51d3
[None][infra] Skip failed cases for main branch (#8293)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-12 08:04:09 -07:00
brb-nv
56a539cd37
[None][chore] Waive failing pre-merge test on main (#8282)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-10 23:52:05 -07:00
Yilin Fan
2695d70d42
[None][feat] Add request timing breakdown option in benchmark_serving (#8128)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-10 09:24:54 -07:00
xinhe-nv
2655995a09
[None][fix] add gc for test fixture (#8220)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 02:50:25 -07:00
bhsueh_NV
d3059dbd8a
[https://nvbugs/5547416][fix] unwaive no_cache test (#8213)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-10 01:50:13 -07:00
xinhe-nv
b555f1ff98
[None][chore] Add failed cases into waives.txt (#8229)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-09 23:45:28 -07:00
xinhe-nv
e8c9bae37e
[None][chore] Remove closed bugs (#8151)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 16:39:40 +11:00
Emma Qiao
ccd949ea5b
[None][infra] Waive failed tests on main 10/09 (#8230)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-09 22:46:07 +08:00
bhsueh_NV
27677a36f5
[https://nvbugs/5516666][fix] unwaive some Qwen3 CI tests (#8130)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-09 09:44:58 +08:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention (#7869)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
QI JUN
6884d06aed
[None][ci] move some llama4 test cases to pre merge (#8189)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-08 18:34:08 -07:00
Liao Lanyu
ed8e00ad4a
[https://nvbugs/5522746][fix] unwaive tests caused by node issues after rebooting (#8193)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-09 08:45:56 +08:00
Mike Iovine
c88913dc03
[https://nvbugs/5541545][fix] Remove test_llama4 (#8031)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-08 15:20:15 -07:00
brb-nv
80517b7812
[None][chore] Waive some tests failing on main post merge (#8186)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-08 06:52:30 -07:00
mpikulski
8298e93bd8
[TRTLLM-8414][chore] BREAKING CHANGE: refine sampling strategy selection (#8132)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-08 15:46:50 +02:00
Liao Lanyu
d57b8f0951
[https://nvbugs/5455140][fix] unwaive tests related to GB200 OOM (#8159)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-08 13:14:12 +08:00
ruodil
971610e3ff
[None][test] add test-model-suites option in integration conftest.py (#8016)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-08 10:38:31 +08:00
Mike Iovine
7facac077b
[None][fix] Fix MTP illegal memory access (#8161)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-07 14:02:55 -04:00
Emma Qiao
ca9da1f1c2
[None][infra] Skip failed cases for main (#8176)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-07 06:37:51 -07:00
xiweny
9298f1bdcc
[None] [test] Add B300 cases to CI (#8056)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-06 19:23:31 -07:00
Faraz
27a5091fcb
[None][feat] GPT-OSS Sm120/Sm121 Support (#7937)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Vincent Huang <vincenth@nvidia.com>
2025-10-06 16:59:06 -04:00
Lucas Liebenwein
3492391feb
[None][chore] AutoDeploy: clean up accuracy test configs (#8134)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-06 12:51:01 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (#5543)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Lucas Liebenwein
2c454e8003
[None][feat] AutoDeploy: Nemotron-H accuracy test (#8133)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 15:39:03 -07:00
Michal Guzek
38da871db3
[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list (#6806)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-03 12:10:48 -07:00
Mike Iovine
ca8291133a
[None][fix] Fix MTP 2-model (#8115)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-10-03 10:13:50 -07:00
Lucas Liebenwein
5faa5e9dd8
[None][feat] AutoDeploy: dive deeper into token generation bugs + enable_block_reuse (#8108)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 04:57:26 -07:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Eran Geva
4136942436
[#7588][fix] fixed the kv cache size parsing in test_perf.py AD backend (#8092)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 15:55:31 -04:00
Erin
293637e0a1
[https://nvbugs/5556020][chore] waive test_eagle3 (#8119)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-02 05:33:21 -04:00
mpikulski
fc7f78c400
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#8110)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-02 10:20:32 +02:00
Eran Geva
32c7f8c36f
[#7588][feat] lock gpu clocks in test_perf.py to reliably detect perf regressions (#8099)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 11:18:10 +03:00
Patrice Castonguay
b77f19f4ff
[https://nvbugs/5434320][fix] fix: Unwaiving disagg pp tests (#8069)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-01 00:33:59 -04:00
Emma Qiao
b1e3fef8aa
[None][infra] Skip failed tests in post-merge for main (#8102)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-01 10:12:10 +08:00
brb-nv
84aa3c981e
[None][chore] Waive failing MNNVL alltoall multi-gpu test (#8106)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-09-30 20:05:42 -04:00
Iman Tabrizian
c510b67fa0
[https://nvbugs/5547414][fix] avoid downloading Tiny llama from HF (#8071)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-30 13:47:59 -04:00
xinhe-nv
1dba9fa89e
[TRTLLM-6239][feat] add test cases into QA test list (#8081)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-30 00:23:45 -04:00
Kaiyu Xie
b0cb9ca50e
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#7466)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-29 23:12:24 -04:00
Cheng Hang
cdce68c3e0
[TRTLLM-6741][fix] Add heuristics for lm head tp size when enable_lm_head_tp_in_adp=True (#7891)
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-30 09:24:35 +08:00
xiweny
48e779ae8c
[https://nvbugs/5541494] [fix] add back missing sm100f bmm kernels (#8051)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-29 05:35:44 -04:00
yufeiwu-nv
3ba6727a68
[None][test] Update get_sysinfo.py to avoid UnboundLocalError (#7982)
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-09-29 05:14:38 -04:00
xinhe-nv
20e6cd39f1
[None][chore] Add failed cases into waives.txt (#8043)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-29 03:37:39 -04:00
Emma Qiao
ce381d6813
[None][infra] Waive failed cases for main on 0929 (#8053)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-29 02:46:02 -04:00
HuiGao-NV
7ac932d45e
[https://nvbugs/5532087][CI] Enable test case (#8029)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-29 01:46:28 -04:00
Ivy Zhang
1e2e851db8
[None][chore] update test case constraint (#8020)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 13:25:09 +08:00
Eran Geva
9cea6bfb30
[#7288][feat] Added AutoDeploy backend support to test_perf.py (#7588)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-28 21:21:27 -07:00
Ivy Zhang
0ecafd84da
[None][chore] Update chunked prefill test case configs (#7868)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 10:37:34 +08:00
Emma Qiao
2be05cbd6e
[None][infra] Skip failed test for main branch on 9/28 (#8040)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-28 07:00:55 -04:00
ChristinaZ
95eac2cda7
[https://nvbugs/5537738][fix] Add fp8 post-quant allgather support (#8008)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-28 15:32:45 +08:00
Iman Tabrizian
33282351a2
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-27 19:29:30 -04:00
Jhao-Ting Chen
c33f43e13a
[https://nvbugs/5518713][fix] Trtllm-gen moe backend for blockwise fp8 ckpt (Qwen3-235B-A22B-FP8) (#7856)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-26 14:29:32 -07:00
Emma Qiao
c8bef27ebb
[None][infra] Waive failed cases in post-merge 2305 (#8019)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-26 10:20:12 -07:00
xinhe-nv
ba6ab62bd1
[None][chore] Add failed cases into waives.txt (#8004)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-26 00:41:02 -07:00
xinhe-nv
f32f5730b2
[None][chore] Add failed cases into waives.txt (#7986)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 23:50:09 -07:00
Lucas Liebenwein
3a96d75a3c
[https://nvbugs/5527956][fix] AutoDeploy: fix IMA due to outdated metadata (#8002)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-25 22:05:55 -07:00
Enwei Zhu
d650320de4
[None][infra] Improve the failure message for accuracy test suite (#7994)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-26 10:04:47 +08:00
Yiqing Yan
108248ece1
[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 09:59:11 +08:00
Emma Qiao
2dc93c6371
[None][infra] Waive failed tests on main (#8001)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 08:13:39 -07:00
Guoming Zhang
202bed4574 [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e [None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5342c607cd [https://nvbugs/5516710][fix] fix Llama 3.3 TP PP case (#7717)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
xinhe-nv
e30d9aced9
[https://nvbugs/4955671][fix] update test list (#7980)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 02:58:09 -07:00
Chuang Zhu
791e73edf6
[https://nvbugs/5536141][fix] fix_disagg_single_gpu_test (#7990)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 02:07:22 -07:00
Emma Qiao
cb53261aaf
[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 01:03:13 -07:00
fredricz-20070104
0945403174
[TRTLLM-6541][test] Add NIM perf test cases (#7924)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-25 13:15:26 +08:00
Iman Tabrizian
be7e51727e
[https://nvbugs/5456485][bug] unwaive triton test (#7966)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 17:02:55 -07:00
Mike Iovine
42c2ec3239
[https://nvbugs/5473781][fix] Fix llama 4 FP8 for PP>1 (#7220)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-24 12:16:27 -04:00
Pamela Peng
b1dc84b4a3
[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-24 11:40:26 -04:00
HuiGao-NV
c8bda4b3a9
[None][ci] Waive some intermittent failures (#7955)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-24 19:00:38 +08:00
Enwei Zhu
a1a57e83b8
[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-24 18:30:23 +08:00
xinhe-nv
b8bfa63197
[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-24 03:25:17 -07:00
QI JUN
18ff1e31b8
[None][ci] remove duplicate test cases (#7956)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-24 17:47:22 +08:00
yufeiwu-nv
f323b74d42
[None][test] Update llm_models_root to improve path handling on BareMetal environment (#7876)
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-24 17:35:57 +08:00
HuiGao-NV
29e63d3bc2
[https://nvbugs/5532248][fix] Fix fused_moe OOM (#7931)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-24 02:22:38 -07:00
QI JUN
946ffcd2eb
[None][ci] optimize test cases of dgx b200 (#7948)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-24 00:39:45 -07:00
xinhe-nv
62563760fb
[None][chore] update chunked prefill cases (#7921)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-24 15:14:49 +08:00
Pengbo Wang
b890d7fea4
[None][infra] Skip failed test for nvbugs 5537738 (#7946)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 23:48:50 -07:00
Yueh-Ting (eop) Chen
cf100933cc
[TRTLLM-6341][feature] Support SWA KV cache reuse (#6768)
This merge request attempts to support more SWA KV cache functionality
inside the KV cache manager. Before this merge request, the KV cache for
sliding window attention (SWA) only holds "window size" number of blocks
and reuse them in a cyclic manner. We will not be able to utilize more
GPU memory with this design, leading to a limited max batch size
throughput. Additionally, we will not be able to support KV cache reuse
with this design.

In this MR, we change such behavior to let the manager write blocks in
a linear manner. With a linear block writing behavior, as the attention
window moves on, the out-of-window (OOW) blocks will be detached. Right
now for the sake of a correct feature first, we directly offload the
OOW block from the primary block pool (GPU memory) to the secondary
block pool (host memory). We will improve this in the future by
delegating the block movement to the eviction policy.

KV cache reuse for SWA is not developed in this merge request and will
be amended in a follow-up merge request.

Writing the blocks linearly, the maximum number of blocks allocated for
a sequence(`GenerationRequest`) is the "max sequence length" specified.
The `GenerationRequest` that stores the cache block bookkeeping
structure will now keep "max sequence length" tokens of blocks.

Given the above, main changes are (more context in the MR):
- Remove "cyclic" concept under the kv cache manager, such concept
  originally guards the block reuse under kv cache manager.
- Add detach mechanism and have it under `KVCacheManager::addToken`.
  Please note that detach is still guarded off for SWA when reuse
  is enabled. A follow-up merge request will proceed to improve this.
- Enforce "max sequence length" to be a non-optional parameter to
  the `KVCacheManager`/`BlockManager`
- Let all window size resource pool get identical proportion of memory
- Fix free memory calculation under `resource_manager.py`

Signed-off-by: eopXD <yuehtingc@nvidia.com>
Co-authored-by: Tomer Asida <tasida@nvidia.com>
2025-09-24 14:28:24 +08:00
Lizhi Zhou
e4f1f90202
[https://nvbugs/5477404][chore] unwaive test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity (#7857)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 10:31:35 +08:00
Lizhi Zhou
7550251988
[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 08:31:56 +08:00
Zheng Duan
e3c1a9409f
[TRTLLM-6549][fix] add kv cache time output back (#7798)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-23 14:12:42 -04:00
Yanchao Lu
6a36349964
[None][test] Waive another intermittent OOM test (#7930)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-23 22:34:09 +08:00
ruodil
05bec3bf0f
[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-09-22 23:04:34 -07:00
Pengbo Wang
a4b4ed4535
[None][fix] Fix and add test for TRTLLM MoE backend (#7755)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 11:26:25 +08:00
Pengbo Wang
08cc7a041f
[https://nvbugs/5355128][fix] Add missing wgmma intrinsic for starcoder (#7643)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:38:58 +08:00
yunruis
126cd707e3
[None][opt] Add batch waiting when scheduling (#7416)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-23 10:27:37 +08:00
Enwei Zhu
59f57598a7
[https://nvbugs/5504086][fix] Fix MTP vanilla (#7904)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 08:38:28 +08:00
Jin Li
b5391b4ac6
[https://nvbugs/5516665][fix] Fix CUTLASS moe fake impl errors (#7714)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-22 11:08:39 -07:00
Emma Qiao
324301ccba
[None][infra] Skip failed test for nvbugs 5532023 (#7905)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 03:49:44 -07:00
Yechan Kim
f77aca9f2c
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-22 03:40:02 -07:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Emma Qiao
d330d0005c
[None][infra] Waive a failed case on main (#7901)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 00:37:01 -07:00
xinhe-nv
9c1b75e978
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-22 00:12:43 -07:00
Wanli Jiang
f5bfd68a50 [https://nvbugs/5509024][fix] Print full parsed outputs and update keywords for multimodal model (#7670)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yi Zhang
f9c9c3f50a [https://nvbugs/5355219][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Ivy Zhang
022bc96fb6 [https://nvbugs/5512734][fix] Update kv cache config for maverick (#7710)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
bhsueh_NV
ef557f880b [https://nvbugs/5437405][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yanchao Lu
5c8b022d1e [None][ci] Test waives for the release/1.0 branch 09/15 (#7700)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
541b7fda89 [https://nvbugs/5503423][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yuxian Qiu
2d46dda6a7 [https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Lizhi Zhou
293d9fb612 [https://nvbugs/5448767][fix] disable kv cache reuse for disagg pp>1 tests (#7354)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
9dc7316b7f
[https://nvbugs/5512556][unwaive] Unwaive DeepSeek PP tests (#7828)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-22 10:26:30 +08:00
dongxuy04
9eb8084ca9
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-21 11:01:51 -07:00
Mike Iovine
8030b540ac
[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access (#7845)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-19 10:30:37 -04:00
pcastonguay
fbe325ce57
[https://nvbugs/5471108][chore] Unwaiving disagg acc test (#7686)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-19 08:56:09 -04:00
Yuxian Qiu
7d28acdbf0
[https://nvbugs/5522332][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783) (#7797)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 18:50:40 +08:00
xinhe-nv
efb763402f
[None][chore] Add failed cases into waives.txt (#7841)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-19 17:59:47 +08:00
Ivy Zhang
0ac51487f4
[None][chore] remove cli cases for rtx6k (#7833)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:33:59 +08:00
Ivy Zhang
6b33bcced2
[None][test] Add accuracy benchmark in stress test (#7561)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:09:46 +08:00
dominicshanshan
451475e0dc
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956. (#7853)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-19 14:54:59 +08:00
Emma Qiao
ea079fa530
[None][infra] Waive failed tests in post-merge (#7859)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-19 14:16:12 +08:00
ruodil
c5453103d6
[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-19 11:12:53 +08:00
fredricz-20070104
fc4e6d3702
[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-19 10:12:55 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
QI JUN
7646da2d85
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-19 07:19:50 +08:00
sunnyqgg
80dd8fe197
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001)
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-18 12:05:36 -04:00
xinhe-nv
d3a907131a
[https://nvbugs/5519462][fix] Add failed cases into waives.txt (#7817)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 20:01:06 +08:00
Wanli Jiang
fe104dc20d
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 17:37:16 +08:00
xinhe-nv
d909f80379
[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 17:13:07 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00
dongfengy
2ae08bd1b8
[https://nvbugs/5519530][fix] Fix gptoss 2-gpu test (#7819)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-18 16:01:53 +08:00
xinhe-nv
236f71ea05
[None][chore] Add failed cases into waives.txt (#7801)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 14:48:16 +08:00
Ivy Zhang
26d50eb539
[TRTLLM-8070][test] add generation logits case for llama3 (#7759)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-18 13:33:16 +08:00
Yan Chunwei
327e5e5eed
[None][ci] restore unwaive list (#7802)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-18 10:50:34 +08:00
Netanel Haber
a5cfc8368f
[https://nvbugs/5508536][fix] Revert #7041: Move stop_criteria to sample_async (#7041) (#7796)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-09-17 21:27:01 -04:00
yunruis
7c03eb9ea2
[https://nvbugs/5516661][fix] Drop waive case 5516661 (#7791)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-09-18 08:55:32 +08:00
Emma Qiao
c4abca323e
[None][infra] Waive failed tests on main (#7812)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-17 23:44:36 +08:00
William Zhang
2614d71994
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-17 08:11:16 -07:00
xinhe-nv
f918302b3a
[TRTLLM-7250][fix] waive block tests (#7782)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:31:03 +08:00
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test (#7328)
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
xinhe-nv
7801d0992b
[None][chore] Remove closed bugs (#7697)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:14:09 +08:00
Fanrong Li
523a17d990
[https://nvbugs/5485325][fix] Cherry-pick #7373: fix the CUDA graph warmup issue when using speculative decoding (#7734)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-17 13:57:39 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa (#7781)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
HuiGao-NV
a49cfb3e68
[https://nvbugs/5516666][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-17 06:24:20 +08:00
xinhe-nv
e7c1569456
[None][chore] Add failed cases into waives.txt (#7746)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 18:43:40 +08:00
Ziyi Xiong
905bb26bbd
[https://nvbugs/5471106][fix] Remove the waivers (#7711)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 17:43:39 +08:00
xinhe-nv
c6ab2072b5
[None][fix] waive hang tests on main (#7720)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 17:05:15 +08:00
xinhe-nv
1fbea497ff
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 16:39:31 +08:00
amitz-nv
750d15bfaa
[https://nvbugs/5503529][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
xinhe-nv
cf55927064
[None][chore] Add failed cases into waives.txt (#7735)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 10:58:06 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
QI JUN
44d5ccfdd9
[None][ci] move qwen3 tests from GB200 to B200 (#7733)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-16 08:12:28 +08:00
Yanchao Lu
0c9430e5a5
[None][ci] Test waives for the main branch 09/15 (#7709)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-15 22:13:56 +08:00
jmydurant
7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
HuiGao-NV
335c007df8
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-15 15:37:58 +08:00
Ivy Zhang
ddfe0320b3
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-15 13:38:52 +08:00
xinhe-nv
b69e3e9f99
[None][chore] Add failed cases into waives.txt (#7682)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-15 11:44:52 +08:00
Chang Liu
47e37755a3
[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-14 20:10:10 -07:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yanchao Lu
70aa4e28c1
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 23:48:04 +08:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
QI JUN
656f229b58
[None][ci] move some test cases from l40s to a30 (#7684)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-11 07:22:34 +08:00
Emma Qiao
9986070044
[None][infra] Waive failed cases on main 0910 (#7676)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-11 01:43:29 +08:00
Dom Brown
fc9d426589
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-10 18:30:48 +01:00
nvamyt
222e01662c
[https://nvbugs/5488212][waive] Waive failed tests for L20 (#7664)
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-10 22:32:15 +08:00
xinhe-nv
207c5258c4
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell (#7505)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-10 21:09:27 +08:00
Bo Deng
bf57829acf
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-10 17:35:51 +08:00
fredricz-20070104
ef620f3579
[https://nvbugs/5410687][test] Add deepseek r1-w4afp8 quickstart (#7645)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-10 10:21:01 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
QI JUN
a0e1604898
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-09 11:06:32 -04:00
Liao Lanyu
af403848d7
[https://nvbugs/5445466][fix] unwaive DS R1 test cases with bug already fixed (#7429)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-09 17:25:49 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels (#7597)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
xinhe-nv
8a52015f50
[None][chore] Remove closed bugs (#7591)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-09 04:08:42 -04:00
Yiqing Yan
5c616da2fd
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 11:33:09 +08:00
Wanli Jiang
1e0669d27a
[https://nvbugs/5453709][fix] Remove transformers version limit in Qwen2VL (#7152)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-09 10:38:20 +08:00
Iman Tabrizian
d96c54d8ae
[None][test] Skip eagle3 test (#7627)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-08 17:23:53 -04:00
dongfengy
fdd5bd49fc
[https://nvbugs/5481080][fix] Fix GPTOSS W4A16 reference (#7323)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-08 13:59:28 -07:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Eran Geva
5f2a42b3df
[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored (#7219)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-08 08:45:58 -04:00
bhsueh_NV
219e95569a
[https://nvbugs/5506683][fix] adjust the CI (#7604)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-08 15:41:41 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd (#7519)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Raayan Dhar
8f3121ac81
[None][fix] chore: fixing the math on asymmetric tp+pp tests (#7098)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-09-07 14:27:46 -04:00
Raayan Dhar
bae9560e62
[https://nvbugs/5448767][fix] sync termination of requests across PP ranks (#7455)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Emma Qiao
aea8ac1649
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base (#7086)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 14:26:18 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
dominicshanshan
9a97f0a3b7
[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 (#7585)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-06 21:29:16 +08:00
QI JUN
525bb806a9
[None][ci] move some test cases of DGX H100 to post merge (#7569)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 01:03:38 -04:00
Emma Qiao
d8ec546b73
[None][infra] Waive failed tests on main branch 0905 (#7564)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-05 22:46:46 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests (#7361)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs (#7408)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py (#7542)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[https://nvbugs/5483615][fix] Remove unnecessary assertion to let mai… (#7441)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Yuxian Qiu
48a5270868
[https://nvbugs/5492485][fix] Use offline dataset from llm-models instead. (#7435)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list (#7468)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests (#7489)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
kris1025
cce9556858
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceManager (#7437)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Jin Li
2a2dfe273b
[https://nvbugs/5485102][fix] Correctly set stride for piecewise outp… (#7442)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case (#7338)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Lizhi Zhou
7c73c2ff4b
[https://nvbugs/5485593][fix] improve accuracy/test_disaggregated_serving.py (#7366)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-03 09:38:53 -04:00
Stanley Sun
cebbf48b74
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-03 08:36:52 -04:00
Mike Iovine
79d93f9419
[https://nvbugs/5488141][fix] Unwaive llama3 test_eagle3 (#7486)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 14:10:40 +08:00
Wanli Jiang
4223a9aada
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-03 10:27:42 +08:00
Simeng Liu
bcc55bcdf3
[https://nvbugs/5470782][fix] Add specific test names for test_deepseek.py (#7318)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-09-02 10:31:40 -07:00
Emma Qiao
aae5d22bfe
[None][infra] Waive failed tests on main branch 0902 (#7482)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-02 10:16:49 -04:00
peaceh-nv
90479c50fb
[https://nvbugs/5453992][unwaive] Unwaive llama quickstart test (#7242)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-02 20:28:32 +08:00
JunyiXu-nv
eefe5f2093
[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-09-02 07:08:22 -04:00
HuiGao-NV
7279297717
[None][infra] waive test case failed on post-merge (#7471)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-02 06:20:08 -04:00
aalanwyr
c3c95736a1
[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-09-02 17:21:27 +08:00
Ivy Zhang
3799e5d460
[None][test] auto reuse torch empty cache on qa test (#7421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-02 04:44:47 -04:00
Yan Chunwei
f90375f37c
[https://nvbugs/5476580][fix] unwaive test_nvfp4_4gpus (#7454)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-09-02 04:17:14 -04:00
Emma Qiao
01dfd3af1b
[None][infra] Waive failed case on main 0901 (#7447)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-01 23:27:24 +08:00
bhsueh_NV
16e9d1121c
[https://nvbugs/5481087][fix] fix bug of ci when we use mocker (#7332)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-01 16:22:45 +08:00
nvamyt
efaefca2c8
[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302)
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-01 12:59:21 +08:00
Yiqing Yan
21291f3d8e [None][chore] Remove duplicate test waives (#6999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Emma Qiao
09bca7ca82 [None][infra] Waive failed tests for release branch 0818 (#6993)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
peaceh-nv
f4dc1ed39c [https://nvbugs/5449218][fix] Fix KvCacheConfig error in test_perf (#6937)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
29cdcdb56a [None][fix] update skip config (#6891)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
d5bc5cd4f2 [https://nvbugs/5375646][fix] update waives.txt for nvbug 5375646 (#6847)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
William Zhang
d15dcdc4ae [https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (#6909)
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
ac07418968 [None][ci] unwaive test_ptp_star_attention_example (#6943)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
b4d41d6604 [TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
612c26be22 [None][doc] add legacy section for tensorrt engine (#6724)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
2480aedb73 [TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
This commit adds some level of FP8 support to Mistral Small 3.1 by:

* disabling quantization for the vision sub-model since `modelopt` does
  support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
  checkpoint.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
3e99744201 [https://nvbugs/5375594][fix] fix oom issue on structural_tag test case (#6838)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
deba2885c1 [None][fix] fix Llama3 eagle3 test case OOM (#6832)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
7841ea6255 [None][chore] waive GB300 known issues (#6812)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc [TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt (#7342)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 (#7370)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules (#7268)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API (#7228)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config (#7297)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 (#7346)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS (#7333)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests (#7274)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name (#7321)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
bhsueh_NV
9d345b31c0
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests (#7293)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 (#7291)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main (#7300)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases (#7292)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench (#7279)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00
bhsueh_NV
f167b1fd99
[https://nvbugs/5453727][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 15:26:10 +08:00
QI JUN
e08c7cf17b
[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 03:12:30 -04:00
dongxuy04
abdb2735be
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-27 01:39:24 -04:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Zhou Yuxin
ccb6aadea8
[https://nvbugs/5412456][fix] Remove from waives.txt (#7248)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 (#7257)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt (#7251)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Zheng Duan
1a929a1490
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregated worker tests (#7028)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 14:25:10 +08:00
nvamyt
d8bd8843fc
[None][test] Update qwen3 timeout to 60 minutes (#7200)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 14:18:42 +08:00
qixiang-99
b165f8bc97
fix/improve kvcache allocation in PyTorch runtime (#5933)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-08-26 12:40:22 +08:00
William Zhang
92576488d3
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-25 23:56:21 -04:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases (#7160)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
chenfeiz0326
6a44e5b9d1
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#6967)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch (#7201)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00
Ivy Zhang
f61b74f796
[None][test] add l20 specific qa test list (#7067)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-25 12:44:08 +08:00
Bo Deng
c038fb3ef4
[None][chore] cherry-pick 6940 (#7097)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 10:28:45 +08:00
xinhe-nv
3ba9afcc7b
[None][feat] add gpt-osss tests to sanity list (#7158)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-25 10:22:07 +08:00
Yiqing Yan
486bc763c3
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:09:04 -04:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI (#7179)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test (#7185)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Suyog Gupta
e3de5758a3
[#7136][feat] trtllm-serve + autodeploy integration (#7141)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
QI JUN
1388e84793
[None][ci] move all B200 TensorRT test cases to post merge (#7165)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-22 06:47:23 -04:00
xinhe-nv
b8b2bd4a0a
[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 17:17:27 +08:00
Linda
898f37faa0
[None][feat] Enable nanobind as the default binding library (#6608)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-22 09:48:41 +02:00
Daniel Cámpora
099f081e03
[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-22 08:09:30 +02:00
xinhe-nv
4017f7cd6b
[None][chore] Add failed cases into waives.txt (#7109)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 10:39:25 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm (#6817)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 (#6864)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Emma Qiao
344bc4575d
[None][infra] Waive failed case for main branch (#7129)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:08:55 +08:00
Dimitrios Bariamis
f49dafe0da
[https://nvbugs/5394409][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-21 18:08:38 +02:00
bhsueh_NV
ba0a86e0bb
[https://nvbugs/5437405][fix] qwen3 235b eagle3 ci (#7000)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 01:17:32 -04:00
xinhe-nv
21f4434404
[None][chore] waive failed cases on H100 (#7084)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-21 11:15:23 +08:00
Yechan Kim
0893afae3d
[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-21 08:54:12 +08:00
bhsueh_NV
73d2daa386
[https://nvbugs/5457489][fix] unwaive some tests (#6991)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 08:49:57 +08:00
QI JUN
a918de710a
[None][ci] move some tests of b200 to post merge (#7093)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-20 19:43:40 -04:00
Emma Qiao
f84dd64250
[None][infra] Waive failed tests on main branch 8/20 (#7092)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-20 06:33:44 -04:00
Robin Kobus
b95cab2a7c
[None][ci] move unittests to sub-directories (#6635)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-20 05:42:22 -04:00
xinhe-nv
9e71b4fda4
[TRTLLM-7205][feat] add llama4 tp4 tests (#6989)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-20 13:22:05 +08:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Ivy Zhang
fc85e3db1c
[None][fix] fix llmapi import error (#7030)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 22:58:13 -04:00
Bo Deng
30da5d3cc4
[None][chore] unwaive test_disaggregated_genbs1 (#6944)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 09:57:35 +08:00
Emma Qiao
8f95f35503
[None][infra] Waive failed tests on main (#7037)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-19 09:31:07 -04:00
Yiqing Yan
07506bccbe
[None][chore] Remove duplicate test waives (#7044)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-19 21:04:31 +08:00
Fanrong Li
655d0f48d0
[https://nvbugs/5455140][fix] unwaive DSR1-fp4 throughput_tp8 (#7022)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 20:48:05 +08:00
xinhe-nv
2c86cee38c
[None][chore] Remove closed bugs (#6969)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-19 16:01:33 +08:00
Ivy Zhang
bff5fdf6df
[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 13:59:14 +08:00
William Zhang
daa2a65d37
[https://nvbugs/5454875][ci] Unwaive Mistral Small 3.1 test (#7011)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-19 00:32:14 -04:00
fredricz-20070104
e90280a84d
[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-19 00:13:04 -04:00
Fanrong Li
816a120af6
[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 00:03:03 -04:00
Lizhi Zhou
71e28eab36
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-19 09:58:22 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill (#6314)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Yiqing Yan
1ce23545fc
[None][chore] Remove duplicate test waives (#6998)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 21:15:49 +08:00
Emma Qiao
69ff32f9b1
[None][infra] Waive failed tests on main 0818 (#6992)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:34:52 +08:00
Shi Xiaowei
5ec15b98f0
[TRTLLM-7030][fix] uppercase def value in pd-config (#6981)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-18 02:33:23 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Naveassaf
d6322f70b7
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs (#6970)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-17 13:38:36 -04:00
Emma Qiao
cc6d763824
[None][infra]Waive failed cases in main branch (#6951)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-17 14:27:59 +03:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
brb-nv
9505727d31
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests (#6952)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy (#6764)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
liji-nv
18ccd053d3
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6858)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt (#6914)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers (#6921)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation (#6840)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
Raayan Dhar
8b237b943b
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test (#6872)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Bo Li
26f413ad90
[https://nvbugs/5450262][fix] Fix unsupported alltoall use case (#6882)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main (#6902)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[https://nvbugs/5445466][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323) 2025-08-14 03:48:57 -04:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill (#6655)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Mike Iovine
7cba883932
[https://nvbugs/5410399][chore] Unwaive mtp llmapi test (#6833)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main (#6863)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00
Anthony Chang
2198587b35
[https://nvbugs/5378031] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend (#6200)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-08-13 21:24:40 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests (#6781)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
Enwei Zhu
7c686ba8de
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill (#6774)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-12 09:30:06 +08:00
Ziyi Xiong
b4fcd5f592
[https://nvbugs/5441438][fix] Set correct draft length for the cuda graph dummy request (#6701)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-12 09:28:47 +08:00
Chang Liu
be9dd4713c
[https://nvbugs/5385987][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically (#6763)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
Emma Qiao
5145e9d40e
[None][infra] Unwaive an updated case to test (#6791)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 06:47:33 -04:00
Emma Qiao
d6ad4a9d5b
[None][infra] Waive failed tests on main 0811 (#6778)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:16:25 -04:00
xinhe-nv
9c358c26e4
[None][chore] remove closed bugs (#6772)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-11 14:39:58 +08:00
Eran Geva
b3e8fa2960
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu (#6487)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-08-11 08:33:13 +03:00
Tracin
49bcaa4e95
Add gpt-oss GSM8K test. (#6732)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-08-10 22:45:43 -04:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg (#6730)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Bo Deng
767879ef85
[https://nvbugs/5431127][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 10:05:10 +08:00
Emma Qiao
ee19ca5e58
[None][infra] Waive test main 0808 (#6751)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-09 23:54:07 -04:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Leslie Fang
294e0d3dab
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI (#6631)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-08 15:30:47 +08:00
Li Min
d913955952
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-08-08 15:03:48 +08:00
ruodil
b15d6fb145
[None][test] fix yml condition error under qa folder (#6734)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-08 15:59:01 +10:00
2ez4bz
064eb7a70f
[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611)
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.

It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-08 01:50:36 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
ruodil
22f45a0e19
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 22:57:04 -04:00
xinhe-nv
88ced50ca7
[TRTQA-2920][fix] Add failed cases into waives.txt (#6719)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-08 12:54:13 +10:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Yuan Tong
db8dc97b7b
[None][fix] Migrate to new cuda binding package name (#6700)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-07 16:29:55 -04:00
Raayan Dhar
4055b764db
[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
2025-08-07 11:18:02 -04:00
pcastonguay
453a06e6ab
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-07 14:17:07 +02:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
xinhe-nv
0a467b00cc
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 17:55:53 +10:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
ruodil
6c1f7d8b91
[None][test] correct test-db context for perf yaml file (#6686)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 02:47:10 -04:00
YueWeng
157ea77549
[https://nvbugs/5375966][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-08-07 10:25:17 +08:00
ruodil
780d7507f9
[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:02:13 +10:00
ruodil
f30398470d
[None][chore] update readme for perf release test (#6664)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
Yan Chunwei
5eae3184fa
[None][chore] add missing tests to test list (#6590)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-06 22:12:27 +08:00
Yechan Kim
1aed7511fe
[https://nvbugs/5430124][fix] Mistral mixture_text_image test case fix (#6648)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-06 06:58:58 -07:00
Iman Tabrizian
13ecb4aced
[https://nvbugs/5328160][fix] Unwaive disaggregated serving tests (#6644)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-06 09:08:29 -04:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
ruodil
0bd99b5d6d
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 01:45:13 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization (#6061)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Yechan Kim
c17f4984e2
[None][feat] Refactor Llava-Next (#6478)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-05 17:53:53 -07:00
ixlmar
1ebceb790d
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-08-05 18:27:43 +01:00
liji-nv
dcbfa7e509
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
Emma Qiao
78a75c2990
[None][Infra] - Split gb200 stages for each test (#6594)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-05 07:10:00 -04:00
xinhe-nv
c32584125e
[TRTQA-2920][fix] Add failed cases into waives.txt (#6600)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Ivy Zhang
08ed9d7305
[None][doc] add introduction doc on qa test (#6535)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 17:02:17 +08:00
Ivy Zhang
d101a6cebc
[https://nvbugs/5410279][test] resubmit timeout refactor (#6337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 16:39:25 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Leslie Fang
164acfa31e
[None][infra] Skip test_eagle3 test with device memory check (#6617)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-05 02:36:03 -04:00
ruodil
7625845365
test: add README_release_test.md for perf test (#6443)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-05 02:07:42 -04:00
xinhe-nv
a178cea324
[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 12:47:53 +10:00
xinhe-nv
fe3d607c4b
[TRTQA-2920][fix] Add failed cases into waives.txt (#6581)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-05 12:41:23 +10:00
Ivy Zhang
f3651adea8
[None][test] update invalid test name (#6596)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 08:01:05 -04:00
Emma Qiao
5d8a5a0cb8
[None][Infra]Waive failed case in post-merge on main (#6602)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-04 19:39:44 +08:00
brb-nv
87e4e9f468
[None][chore] Add unit test for Gemma3 lora (#6560)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-04 04:56:57 -04:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens (#6259)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
xinhe-nv
a54972e463
[None][fix] remove closed bugs (#6576)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:52:11 +10:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
ruodil
6459725bf9
test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:22:39 +10:00
Ivy Zhang
5eefdf2c75 tests: Add llama4 functional cases (#6392)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
ruodil
8d82ccca63
test: modify max_lora_rank of phi4_multimodal to 320 (#6474)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 12:20:22 +10:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Ivy Zhang
7547a7d0a2
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-03 22:11:26 -04:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Jhao-Ting Chen
4da5cfc511
[None][infra] add eagle3 one model accuracy tests (#6264)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-02 16:07:46 -07:00
Lizhi Zhou
6f34f3489b
[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-01 13:33:34 -04:00
xinhe-nv
263c6c0ad0
test: skip post blackwell (#6357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 13:10:14 -04:00
Emma Qiao
16febefee0
[None][Infra] - Skip failed tests in post-merge (#6558)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-01 22:21:23 +08:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
liji-nv
1daa8c3232
[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… (#6355)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
xinhe-nv
fca0d37798
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 20:27:22 +10:00
chenfeiz0326
ba5bdbb138
[None][chore] Disable add special tokens for Llama3.3 70B (#6482)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-01 17:03:27 +08:00
Yukun He
90856bf97d
[https://nvbugs/5419069][fix] Fix the mismatched layer name components. (#6417)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-01 16:32:39 +08:00
Ivy Zhang
71524a1a48
[https://nvbugs/5419066][fix] Use trt flow LLM (#6467)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-01 03:33:07 -04:00
Venky
ad5742b105
[fix] Update get_trtllm_bench_build_command to handle batch size and tokens (#6313)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-01 00:08:09 -04:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. (#6232)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
Faraz
8e84df74b5
Fix e2e test failure for RTX6000 Pro (#6420)
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>
2025-07-30 23:32:44 -04:00
xinhe-nv
ca534e4798
test: add accuracy reference (#6479)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-31 12:27:29 +10:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
brb-nv
0e16d1f070
test: Add time logging for lora tests (#6466)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 14:02:43 -07:00
Anurag Mukkara
fac186e3b5
[nvbug/5409417] Unwaive llava test case (#6460)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-30 14:38:47 -04:00
brb-nv
f6287e4498
Unwaive Gemma2 LoRA test on H100 (#6461)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 12:56:12 -04:00
Bo Deng
24e7f4eece
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-31 00:41:37 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
0f083b9daf
fix: Unwaive triton cpp test [nvbug 5401088] (#6412)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-30 11:25:18 -04:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
Yechan Kim
22b29df38c
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 17:29:55 +08:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
2ez4bz
d6eed1b624
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-30 14:10:36 +08:00
xinhe-nv
c00d6763b2
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-30 12:36:58 +10:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
xinhe-nv
f1086e7d4f
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-29 19:01:23 +10:00
xinhe-nv
4fbb344caf
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-29 19:00:30 +10:00
Yukun He
0eee2e2850
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-29 16:41:48 +08:00
ruodil
e11255e9d0
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-29 15:52:45 +10:00
Michal Guzek
2573bb729d
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-28 14:02:14 -07:00
2ez4bz
cdca541148
[test] Unwaive mistral3.1 small E2E test (#6352)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 14:37:42 -04:00
2ez4bz
60e4d3a9d4
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 09:41:44 -07:00
ruodil
03632a679f
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-28 20:33:32 +10:00
xinhe-nv
971be1fe86
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-28 20:31:43 +10:00
Ivy Zhang
2945817cae
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-28 15:33:30 +08:00
Emma Qiao
b3ca159787
[Infa] - waive failed cases and fix a typo (#6384)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-28 02:06:57 -04:00
Chang Liu
dc757799e1
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266) 2025-07-27 23:29:21 -04:00
Yan Chunwei
908f49a4ad
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 09:01:10 +08:00
Iman Tabrizian
c35c78ff58
[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-25 12:47:01 -07:00
nv-guomingz
b8d4cb8beb
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
2025-07-25 12:55:56 -04:00