Commit Graph

980 Commits

Author SHA1 Message Date
Ivy Zhang
64b7f04fdc
[test] split nemotron test cases from examples_test_list (#5238)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-16 16:36:33 +08:00
xinhe-nv
802f22cd12
test: [CI] Add failed cases into waives.txt (#5221)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-16 16:11:53 +08:00
Yiqing Yan
8445416c39
Waive L0 tests (#5233)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-16 15:19:03 +08:00
ruodil
2848e012ae
test: add llama4 models for perf test (#5187)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-16 11:24:35 +08:00
ruodil
3d22f27063
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-16 11:23:20 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation (#5179)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130) 2025-06-15 18:54:04 +03:00
Tailing Yuan
0b60da2c45
feat: large-scale EP(part 7: DeepEP integration) (#4792)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-14 19:12:38 +08:00
Enwei Zhu
5f2785fb90
fix: Fix waive list (#5205)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-13 23:33:23 +08:00
QI JUN
952f33dcad
CI: move all test cases of TensorRT backend into post merge (#5186)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-13 20:48:48 +08:00
xinhe-nv
30d9d0fa71
test: [CI] Add failed cases into waives.txt (#5178)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 16:38:51 +08:00
Ivy Zhang
28cd536bd6
[test] Update timeout params in QA test list (#5124)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-13 13:40:03 +08:00
Iman Tabrizian
01bd4c00b4
Add two MTP disaggregated test (#4546)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-13 12:17:45 +08:00
xinhe-nv
d9be419f45
tests: update tests for b200 (#5180)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 11:25:33 +08:00
ruodil
fa582cbe9a
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-13 11:09:15 +08:00
nv-guomingz
cf35a079f9
fix:https://nvbugs/5298661 (#5022)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 20:41:44 +08:00
Shi Xiaowei
88cba5f354
test: waive the NIXL related tests (#5153)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-12 17:02:27 +08:00
Fanrong Li
4d070d3862
chore: fix typo in tests (#5092)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-12 15:11:26 +08:00
Michal Guzek
53983ad273
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 15:06:28 +08:00
ruodil
d021cc5126
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-12 14:59:16 +08:00
Venky
c3b2eb6dab
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras (#5066)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-12 14:19:15 +08:00
xinhe-nv
11b94feff8
test: skip disaggregated tests on arm (#5070)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-11 17:00:10 +08:00
ruodil
56abae0835
test: add more llama_v3.3_70b cases in perf test (#4979)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-11 15:44:22 +08:00
Yiqing Yan
0a9f105931
Waive L0 tests (#5111)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-11 11:53:15 +08:00
Zheng Duan
580a92521e
test: conditional disagg and cache aware balancing for deepseek v3 (#4522)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-11 09:44:29 +08:00
liji-nv
f6a49a9343
[CI] waive failing L0 test (#5089)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-10 20:40:44 +08:00
Yiqing Yan
8ec8e4559d
Waive L0 test (#5077)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 16:23:49 +08:00
Yiqing Yan
fdfc711261
Waive L0 test (#5067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 15:40:57 +08:00
Stanley Sun
74b0e71ef4
test: add more disaggregated serving tests into QA testlist (#5036)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-10 09:24:53 +08:00
pcastonguay
5b84fd9201
[nvbug 5283506] fix: Fix spec decode triton test (#4845)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 08:40:17 -04:00
Yukun He
137fe35539
fix: Fix warmup phase batch size out of range. (#4986)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-09 19:19:16 +08:00
Yuxian Qiu
88480197da
ci: [nvbugs/5280806] Unwaive unittests/_torch. (#4951)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-09 19:04:11 +08:00
liji-nv
1d4f748773
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-09 17:50:57 +08:00
Yiqing Yan
6b17dff2f1
Waive L0 test (#5024)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-09 16:03:15 +08:00
Yan Chunwei
f4bfb8e49d
ci: unwaive llmapi launch test (#4991)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-09 13:25:43 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 (#4898)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Mike Iovine
ec0d984656
[nvbug/5280806][fix] Fix 2 model spec decode flow (#4807)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-08 07:40:02 -04:00
Yanchao Lu
9e05613679
[Infra] - Update JNLP container config (#5008)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-08 16:44:09 +08:00
QI JUN
5ee0de7f2a
Resubmit #4894 (#4969)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-08 04:42:15 +08:00
Ivy Zhang
7dce328ad6
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
2025-06-07 11:18:32 +08:00
Fanrong Li
75d020cf07
fix: fix cuda graph padding for spec decoding (#4853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-06 22:21:42 +08:00
Anthony Chang
eeb555e37b
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-06 16:13:54 +08:00
xinhe-nv
564472168e
test: [CI] Add failed cases into waives.txt (#4966)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-06 10:30:15 +08:00
QI JUN
ec50684d80
Revert "fix a bug of global cuda graph dummy request" (#4970) 2025-06-06 08:54:45 +08:00
QI JUN
154f7cc40a
fix a bug of global cuda graph dummy request (#4894)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 19:47:40 +08:00
Yiqing Yan
7e921c78b5
Waive L0 tests (#4953)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-05 19:36:48 +08:00
Shunkangz
3eae58ca36
Add disaggregated unittest (#4899)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-05 19:14:31 +08:00
QI JUN
d5a8079eb6
Revert "[infra] Unwaive unittests/_torch" (#4950) 2025-06-05 17:21:07 +08:00
xinhe-nv
1c3091c63b
tests: [TRTQA-2906] add benchmark serving tests (#4901)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-05 14:33:03 +08:00
Yiqing Yan
9ceef983c0
Waive L0 tests (#4927)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-05 11:09:01 +08:00
xinhe-nv
50a74a1daa
tests: fix 5273697 (#4685)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-05 10:39:21 +08:00
Mike Iovine
8433091630
[infra] Unwaive unittests/_torch (#4919)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-05 08:49:37 +08:00
Lucas Liebenwein
f9d45e03a4
[AutoDeploy] deprecate CI post-merge tests and keep them for local testing (#4892)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 08:27:17 +08:00
Yi Zhang
1fca654bfd
tests: Update gb200 test case (#4754)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-04 18:49:20 +08:00
Shi Xiaowei
b13f8c9cba
Fix: NVBug 5302895 (#4835)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-04 09:31:39 +08:00
Simeng Liu
2384655c3a
chore: Waive examples/test_mistral.py::test_llm_mistral_v1_1gpu. (#4873)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-06-03 14:45:14 -04:00
Iman Tabrizian
141467d4b6
Add pre-merge Triton backend tests (#4842)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-03 00:47:58 -04:00
ruodil
fa93eeee84
shorten reqs in con:1 cases and add streaming cases, and add l2 perf … (#4849)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 12:28:13 +08:00
Ivy Zhang
8686868531
tests: [TRTQA-2905] improve timeout report for qa test cases (#4753)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 12:27:27 +08:00
Robin Kobus
e34a1beb72
[nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding (#4735)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-03 10:40:43 +08:00
Fanrong Li
380a5d1690
[https://nvbugs/5271281][fix] fix a pd+mtp accuracy issue (#4536)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 10:03:34 +08:00
Fanrong Li
13f68338d2
fix: [https://nvbugspro.nvidia.com/bug/5273945] Unwaive tests for bug-5273945 (#4832)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-02 22:01:57 +08:00
Yanchao Lu
8166649d03
[Infra] - Minor clean-up and test Ubuntu mirrors (#4829)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 20:18:20 +08:00
Fanrong Li
7d356efc7d
fix: fix accuracy and illegal memory access issues when using mtp + attention dp (#4379)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-02 00:35:52 +08:00
amirkl94
8039ef45d3
CI: Performance regression tests update (#3531) 2025-06-01 09:47:55 +03:00
Emma Qiao
202813f054
Check test names in waive list (#4292)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-01 14:39:30 +08:00
Dom Brown
338d6e9f95
[nvbug 5305210] fix: Resolve nvbug 5305210 (#4759)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-31 19:21:06 +08:00
Emma Qiao
c945e92fdb
[Infra]Remove some old keyword (#4552)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-31 13:50:45 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. (#3967)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
yuanjingx87
2c48ff5898
[feat] add b200 support via slurm (#4709)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 14:49:46 +08:00
Yan Chunwei
33a9ba55f5
fix: test trtllm-bench mgmn (#4613)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 14:43:47 +08:00
ruodil
500aca4f44
test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 13:58:47 +08:00
QI JUN
058f83e47b
CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:15:56 +08:00
xinhe-nv
93283484c2
test: [CI] Add failed cases into waives.txt (#4688)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-28 22:04:35 +08:00
amirkl94
fbec0c3552
Release 0.20 to main (#4577)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
xinhe-nv
bb3d998eb1
test: [CI] remove closed bugs (#4638)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 18:07:59 +08:00
Yiqing Yan
92a7984945
Waive L0 tests (#4686)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 15:07:02 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 (#4510)
* add rcca tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* skip tests on blackwell

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
yuanjingx87
732d92ff62
[Infra] - Multi-GPU testing support with Slurm (#4454)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 19:44:19 +08:00
Enwei Zhu
88190faa34
feat: large-scale EP(part 4: Static EP load balancer integration) (#4615)
* MoeLoadBalancerConfig

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* MoeLoadBalancer integration

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* config file

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-26 18:25:11 +08:00
Yiqing Yan
2fee408536
Waive L0 tests (#4645)
* Waive L0 tests

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* Apply suggestions from code review

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 11:05:01 +08:00
Yanchao Lu
20c15fc04f
Fix invalid testcase name (#4626)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-24 00:40:00 +08:00
Anthony Chang
bbea2647b1
Qwen3 supports TRTLLM FP4 MoE backend (#4530)
* MoE TRTLLM backend for Qwen3

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* add extra moe_backend to test

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* address comments

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* conditionally compile kernels on newer archs

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* missing positional arg

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* Update the routing kernels

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* Revise usage of TLLM_LOG_ERROR

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* Add unit test for Qwen3 moe (trtllm_gen backend)

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* improve weight processing speed of moe_backend=TRTLLM; roughly 2x

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* tidy and minor fix

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* temporarily disable accuracy test that has known issue

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

---------

Signed-off-by: Anthony Chang <anchengc@nvidia.com>
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
Co-authored-by: Christina Zhang <christinaz@nvidia.com>
2025-05-23 18:31:08 +08:00
Enwei Zhu
d7443b6068
[https://nvbugspro.nvidia.com/bug/5181262] [test] Unwaive Mistral Nemo test (#4515)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-23 10:14:00 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
* feat: Enabling dis serving with TRT backend with Python runtime

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing formatting

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing disagg mtp test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402)
[fix] Fix chunked prefill + overlap scheduler

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
Venky
c713eb5799
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446)
ultra

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-22 13:07:33 -07:00
xinhe-nv
22c01d5b21
test: [CI] Add failed cases into waives.txt (#4549)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* fix test issues

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-22 17:18:53 +08:00
ruodil
1a45890dae
test: waive hanging cases for perf test (#4562)
waive hanging cases

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 15:50:05 +08:00
HuiGao-NV
bc9f1dbede
fix[nvbug-5228840]: Remove test cases of feature not supported anymore (#3972)
* Remove waived cases
* Remove test cases of not supported feature

Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-22 11:18:58 +08:00
Michal Guzek
9033dd987d
[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct (#4415)
Add phi-4-mini CLI acc test

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-22 09:56:48 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL (#4125)
* agentConnection

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

recv

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

agentState

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

NIXL interfaces

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

update cmakelists

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

nixl improve

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

remove cppzmq

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

transferAgent remove register

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

work for cache Test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

reduce sleep time

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

intergarte

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

nixl env

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix rebase error

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

cpp test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

stash for send metaData

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

loadRemoteMD after fetchRemoteMD

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

workaround for mixed gen and context

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

test_env

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

avoid port conflict in test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* format

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* use std::string

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* typo

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* fix transferAgentTest

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Dom Brown
1cffa99792
test: Split test_simple into mpi_utils and cache transceiver tests for DGX (#4451)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-22 04:26:21 +08:00
Venky
0a8461d54c
test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) (#4499)
add low concurrency perf tests

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-21 10:46:48 -07:00
xinhe-nv
407ef08662
tests: add qwene fp4 tests into QA test list & update sanity test list (#4478)
* update sanity test list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update test list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 16:52:02 +08:00
ruodil
83f1933f0c
test: add failed case in waive list and fix some test script issue for perf test (#4527)
add failed case in waive list and fix some test script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-21 16:37:25 +08:00
QI JUN
15317ece5a
CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite (#4520)
waive test_fp8_block_scales_4gpus of deepseek v3 lite

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 13:19:43 +08:00
xinhe-nv
750f412b8f
tests: add llama 3.3 70b 2 nodes tests (#4391)
* add llama 3.3 70b 2 nodes tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* remove enable_overlap_scheduler parameter

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-21 12:42:45 +08:00
Chuang Zhu
ab5bea957d
unwaive some disagg test (#4476)
* unwaive some disagg test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* pytest.mark.skip_less_device(4)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-21 11:45:11 +08:00
Yan Chunwei
9199793848
fix: llmapi-launch add add trtllm-bench test with engine building (#4091)
* add trtllm-bench mgmn test

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-21 10:18:01 +08:00
Zheng Duan
77a0189554
feat: conditional disaggregation in disagg server (#3974) 2025-05-21 09:57:46 +08:00
Venky
9a8c3ece22
test(perf): Add remaining Phi-4-mini-instruct perf tests (#4443)
add remaining 2 phi cpp perf tests

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 09:26:12 +08:00
xinhe-nv
19c6e68bec
test: [CI] remove closed bugs (#4417)
* waives closed bugs

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-21 09:13:25 +08:00
bhsueh_NV
ec4190fb71
infra: Add qwen3 235B tests into QA (#4483)
* add qwen3 qa test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* add qwen3 test into qa list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-20 17:37:09 +08:00
ruodil
b5edf13b33
test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282)
* add cases for rtx_pro_6000 and update test filter

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-20 10:58:05 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362)
* Add CLI TestLlama3_3_70BInstruct acc tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add tests to qa lists

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add comment

Signed-off-by: moraxu <mguzek@nvidia.com>

* Fix test names

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update yaml files

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update cli file

Signed-off-by: moraxu <mguzek@nvidia.com>

---------

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
xinhe-nv
402385588d
test: [CI] Add failed cases into waives.txt (#4429)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive id

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-20 09:43:55 +08:00
Yuxian Qiu
c8e062bfd3
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-19 14:25:36 -07:00
Venky
bb02d86b54
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) (#4128)
* changes to run llama-v3.3-nemotron-super-49b

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* yapf

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* address review comments pt 1

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* re-add cpp super tests 

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-19 12:00:48 -07:00
Faraz
7656af1b57
[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335)
* add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* update cutlass versions

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* added internal cutlass with fix and docker update

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* added mixtral to pro 6000

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 08:56:21 -07:00
liji-nv
58e405624a
[https://nvbugs/5123103][fix] Fix torch compile for DeepSeekV3 (#3952)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests (#4336)
* Add llama4 disagg accuracy tests

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Make it async and add GSM8K benchmark

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Dom Brown
c45f414bbf
Test: Improve model re-use in C++ DGX tests for CI stability (#4263)
* Fix padded vocab size for Llama

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Refactor multi GPU llama executor tests, and reuse the built model engines

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Fix test list typo

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Further WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Update test lists and readme

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Try parametrize for asymmetric

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Parametrize + skip unsupported combinations

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

* Update test list

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

* Reduce environment duplicated code

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-19 14:20:21 +01:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code (#3833)
* chore: cleanup perf_evaluator code

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* up

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Ivy Zhang
58d2508b89
tests: Add test cases for rcca cases (#4347)
* add qwen2_0_5_instruct cp4 test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen2.5 fp8 kvcache test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add ds distill qwen cpp runner test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 12:06:43 +08:00
Ivy Zhang
c4a0d768b5
tests: add qa test mentioned in docs (#4357)
* add nemotron-h and llama_70b cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add llm decoder quick_start case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen3 quickstart test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add trtllm_decoder accuracy test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove quickstart test for llm_decoder

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix import error

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* nemotronh fp8 trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove nemotronh-fp8

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 10:06:51 +08:00
Faraz
791c209006
[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) (#4363)
* added nemotron 49b fp8 for B40 release

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* add tests to QA list

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* pre-commit changes

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 09:30:24 +08:00
Iman Tabrizian
7de90a66bc
Remove vila test (#4376)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 09:02:39 +08:00
Yanchao Lu
0d7269e2a7
[Infra][Docs] - Some clean-up for the CI pipeline and docs (#4419)
* [Docs] - Some clean-up for the docs

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

* [Infra] - Some clean-up for the CI pipeline

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 00:07:45 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API (#4180)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Venky
fb663b637a
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) (#4195)
* add ll-nm-nano tests that map to nim requirements

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

* prune some pytorch cases (fp8)

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

* removing pyt backend test changes

- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging.
- Therefore don't want to block this PR, hence removing them.
- Seeing

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-17 22:46:21 +08:00
Yuxian Qiu
cc1bba1686
test: Waive tests for nvbugs/5286795. (#4409)
* Waive tests for nvbugs/5286795.

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-17 19:41:05 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible (#3439)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00
liji-nv
fb437ed709
[CI] waive accuracy/test_cli_flow.py::TestTinyLlama1_1BChat::test_pp4 (#4397)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-16 20:18:07 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 (#4255)
* fix: Fix/fused moe 0.19 (#3799)

* fix bug of stream init

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix: Add pre-download of checkpoint before benchmark. (#3772)

* Add pre-download of checkpoint before benchmark.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Add missing remote code flag.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Move from_pretrained to throughput benchmark.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Move download and use snapshot_download.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Removed trusted flag.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Fix benchmark command in iteration log test.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

---------

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839)

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fuse

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* TRTLLM-4875 feat: Add version switcher to doc (#3871)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* waive a test (#3897)

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939)

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

* fix: remote mpi session abort (#3884)

* fix remote mpi session

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* skip fp8 gemm for pre-hopper (#3931)

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975)

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update multigpu list

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix namings

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* Doc: Fix H200 DeepSeek R1 perf doc (#4006)

* fix doc

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

* update perf number

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

---------

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

* Fix the perf regression caused by insufficient cache warmup. (#4042)

Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* doc: Update 0.19.0 release notes (#3976)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060)

The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* Update switcher (#4098)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* doc: update release notes (#4108)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* docs:update 0.19 doc. (#4120)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

* docs:add torch flow supported model list. (#4129)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

* doc: Release V0.19 Perf Overview Update (#4166)

Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>

* Fix readme of autodeploy.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update tensorrt_llm/_torch/pyexecutor/llm_request.py

Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Revert mgmn worker node.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Change to disable_overlap_scheduler.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
xinhe-nv
500b43e90c
test: [CI] remove closed bugs (#4345)
update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-16 13:47:42 +08:00
Stanley Sun
11aa50d1ea
test: add kv cache aware test cases to qa test list (#4257)
add kv cache_aware test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-16 12:47:01 +08:00
Iman Tabrizian
4c7191af67
Move Triton backend to TRT-LLM main (#3549)
* Move TRT-LLM backend repo to TRT-LLM repo

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Address review comments

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* debug ci

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Update triton backend

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Fixes after update

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-16 07:15:23 +08:00
yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Venky
adb0839a33
test(perf): Add Phi-4-mini-instruct to perf tests (#4267)
* add phi-4-mini-instruct

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* trim tests

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-15 21:27:03 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" (#4355)
Revert "[test] add qa test mentioned in docs (#4248)"

This reverts commit b0ce1371ee.
2025-05-15 18:47:30 +08:00
Stanley Sun
9d3e05486b
test: add qa test list for rtx5090 and rtx_pro_6000 (#4254)
* add test list for rtx5090 and rtx_pro_6000

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* add 2gpu llama70b test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* remove duplicate and invalid test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* add 2gpus test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

---------

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-15 17:57:31 +08:00
xinhe-nv
14bfb5e0d6
test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283)
* update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-15 15:57:44 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA (#3571)
* support kv cache reuse for MLA

load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* add CI test

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

* resolve comments

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* use GPTJ style RoPE for MLA

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix rebase error and some docs

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix kv_lens

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* tiny fix

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix torch compile

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix: use normal device memory instead of pinned memory for unit test

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

* fix L0 tests

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix torch compile after rebase

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* resolve comments

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* resolve comments again

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

---------

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
dominicshanshan
404fbe9b32
[https://nvbugs/5277113][fix]genai-perf API change stress test (#4300)
* fix bug 5277113.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* fix bug 5277113 and 5278517.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs (#4248)
* add nemotron-h and llama_70b cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add llm decoder quick_start case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen3 quickstart test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add trtllm_decoder accuracy test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove quickstart test for llm_decoder

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346)
Reorganize TestDeepSeekR1::test_nvfp4_8gpus

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer (#4132)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092)
* chore: Remove GptSession/V1 from TRT workflow

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove stateful decoders

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession buffers

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession utils

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession kernels

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove V1 GPT models from tests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSessionBenchmark from scripts and docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSession IO classes

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from test lists

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove useless encoder test

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove mActualBatchSize from DecoderState

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove static batching from ExecutorTest

- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Faraz
42de79d49e
test: Added tests for Llama3.1-70B-BF16 on SM120 (#4198)
* Added tests for Llama3.1-70B-BF16 on SM120

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* solve conflicts add more tests

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-14 11:57:49 -04:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 (#4235)
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Kaiyu Xie
6c45586c51
chore: Remove deprecated Python runtime benchmark (#4171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-14 18:41:05 +08:00
xinhe-nv
f2bfe2f84f
test: [CI] remove closed bugs (#4207)
update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-14 17:59:05 +08:00
DylanChen-NV
206f82115d
[bug/5247505] fix: CP accuracy on Blackwell (#4188)
* fix xqa params for cp

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>

* add test

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>

* add test

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>

* try adding B200 multi gpu test

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>

* add accuracy tests for cp

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>

---------

Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-05-14 17:40:50 +08:00
Yiqing Yan
a66a02a75a
[Infra] Waive L0 test (#4295)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-14 16:38:33 +08:00
Zongfei Jing
bb17649517
test: Add UT for moe trtllmgen (#4258)
* Add ut for moe trtllmgen

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Update tests/unittest/_torch/modeling/test_modeling_deepseek.py

Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

---------

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-14 15:22:58 +08:00
bhsueh_NV
1a9298bc66
CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266)
add fp8/fp4 ci on Qwen3-30B-A3B

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-14 14:38:04 +08:00
brb-nv
8280c3d4f2
feat: Support Gemma3-1b-it in Pytorch workflow (#3999)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 14:02:44 +08:00
brb-nv
1ef117688c
test: Validate FP8 and LoRA for Gemma3 (#3670)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-13 17:28:02 -07:00
Iman Tabrizian
f408de2d99
Waive disagg kv cache load balancer test (#4276)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-14 06:03:24 +08:00
brb-nv
cd5b3d21a0
feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 03:47:22 +08:00
Yiqing Yan
290649b6aa
[Infra] Waive L0 test (#4269)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 23:06:13 +08:00
Yiqing Yan
bfa16a63d4
[Infra] Waive L0 test (#4268)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 22:43:17 +08:00
dominicshanshan
44d6adfb68
Waive stress test. (#4262)
* Waive stress test.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 21:01:57 +08:00
Enwei Zhu
8f68d56cc1
[https://nvbugs/5220763] [test] Unwaive Mixtral FP8 TP2 test (#4252)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 15:55:33 +08:00
Yiqing Yan
fda8b0277a
[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 (#4049)
* [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix review

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update images

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* Update jenkins/L0_Test.groovy

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update image name

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-13 14:59:12 +08:00
ruodil
d555fe2530
test: fix for perf test script issue (#4230)
fix for perf test script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-13 10:29:20 +08:00
xinhe-nv
0cebc16139
test: [CI] Add failed cases into waives.txt (#4205)
waive tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:22:42 +08:00
xinhe-nv
7ebae4dcaa
test: [CI] Add failed cases into waives.txt (#4203)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:08:02 +08:00
Enwei Zhu
035d915fea
[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090)
* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* normalize mtp_nextn

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update test_durations

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 07:41:51 +08:00
wili
eba3623a54
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
* feat/vbws-part4-v1.8: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* feat/vbws-part4-v1.9: fix incorrect output when using short output length

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.1: remove useless variables

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.2:fix incorrect output when using short output length

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.3: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.4: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.5: remove API change

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

---------

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-12 22:32:29 +02:00
Enwei Zhu
c31ca1688c
[https://nvbugs/5214229] [fix] Unwaive lm_head quantization case (#4222)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 20:23:06 +08:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router (#3831)
* kv cache aware router

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* add tests

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* router config

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

add test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction detect in worker test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* move worker tests to single gpu

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* reduce memory fraction

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* fix partial block

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

---------

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding (#4066)
* finish

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* exc overlap scheduler

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix api ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Yechan Kim
3e9bda3a09
[feat] Support HyperCLOVAX-SEED-Text language part (#3902)
* feat: support HyperCLOVAX-SEED-Text language part

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add Pytorch flow and remove test file

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* revert summarize

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix summarize

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove from pytorch example

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-05-12 16:05:14 +08:00
ruodil
9c03a7ab74
test: add llama_3.2_1B model and fix for test lora script issue (#4139)
* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* add llama_3.2_1B model and fix for lora script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-12 14:51:59 +08:00
xinhe-nv
849d9c343c
tests: https://nvbugs/5219534 remove failed tests from test list (#4113)
remove unsupported tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-12 14:13:40 +08:00
Yiqing Yan
3c54e84e47
[Infra] Waive L0 test (#4212)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-12 11:37:49 +08:00
QI JUN
f021afa241
[CI] waive two multi-gpu test cases (#4206)
waive two multi-gpu test cases

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-12 08:04:48 +08:00
Dom Brown
2d0f93a054
Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027)
* Refactor: Restructure C++ tests for better modularisation of non-shared code

Start cleanup of pytest code for C++ tests

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Clean up names and remove references to test_cpp.py

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Move multi-GPU code

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Update doc and try un-waiving

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Update multi GPU file check

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Address minor multi-GPU setup bug

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-09 19:16:51 +01:00
Mike Iovine
4b8ba7ad61
[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069)
[fix] Fix llama 4 test lists

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-09 22:45:14 +08:00
ruodil
bf5b2a2e0a
test: amend regex match for perf throughput (#4186)
amend regex match for perf throughput

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 17:33:25 +08:00
xinhe-nv
9082411a50
test: [CI] Add failed cases into waives.txt (#4165)
wavie oom tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-09 16:56:30 +08:00
ruodil
5ce5b81281
test: amend default pytorch extra-llm-api-config.yml in perf test (#4176)
* amend default pytorch extra-llm-api-config.yml

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* add print info to separate cases in output log

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 16:46:48 +08:00
Bo Li
e3cf3fd15f
test: Add fp8kv to DS-v3-lite integration tests. (#3950)
* Add fp8 kv cache tests to DSV3-Lite integration tests.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update gsm8k.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update CI list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update TestDeepSeekR1.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Fix test list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Need quant_config besides pytorch_config.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list (bug 5239087).

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Correct test name.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

---------

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <bobboli0202@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-09 13:35:04 +08:00
Ivy Zhang
c91d03fa0a
test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440)
* add mistral-7b-v0.1 torch flow test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mistral

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mixtral case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove api function test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mistral nemo cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mixtral cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update list

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove awq llmapi test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* adjust threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix partial comments

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix path

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update thres

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove duplicate test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:32:02 +08:00
Stanley Sun
fb31f91e15
test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-09 11:03:02 +08:00
Ivy Zhang
7666bec7c4
[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053)
* add MMLU, GPQADiamond check for llama-4 models

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add nomotron cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add online quant test cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove trt flow cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* adjust parallelism strategy

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix fail

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update sanity list

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix comment

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* skip nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:10:41 +08:00
xinhe-nv
4468158be4
test: [CI] remove closed bugs (#4046)
update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:04:43 +08:00
Yiqing Yan
ce8832e80f
[Infra] Waive L0 flaky test (#4148)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-08 17:23:45 +08:00
yuanjingx87
6e1d2a1320
feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI (#4019)
* Add slurm support with RTXPro6000 PostMerge Tests

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

* remove H100 post merge test from testing

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

---------

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-08 15:15:36 +08:00
Enwei Zhu
dae6781494
test: Waive disagg accuracy test (#4124)
* waive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* waive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 13:39:07 +08:00
ruodil
4d0e462723
tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864)
* tests: skip writing prepare_dataset output to logs

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-07 13:56:35 +08:00
Enwei Zhu
c28b90984f
[TRTLLM-3925, https://nvbugs/5245262] [fix] Normalize LLM.generate API (#3985)
* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-07 11:06:23 +08:00
Venky
62fea1e885
test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822)
*   **Model:** Llama-3.1-Nemotron-Nano-8B-v1
*   **Precision:** float16
*   **Environment:**
    *   GPUs: 1 H100 PCIe
    *   Driver: 570.86.15

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128`
*   **Request Throughput:** 81.86 req/sec
*   **Total Token Throughput:** 20956.44 tokens/sec
*   **Average Request Latency:** 5895.24 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000`
*   **Request Throughput:** 1.45 req/sec
*   **Total Token Throughput:** 5783.92 tokens/sec
*   **Average Request Latency:** 211541.08 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128`
*   **Request Throughput:** 52.75 req/sec
*   **Total Token Throughput:** 13505.00 tokens/sec
*   **Average Request Latency:** 5705.50 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000`
*   **Request Throughput:** 1.41 req/sec
*   **Total Token Throughput:** 5630.76 tokens/sec
*   **Average Request Latency:** 217139.59 ms

Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>
2025-05-06 17:17:55 -07:00
dominicshanshan
3ac6637005
fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836)
* Remove stdout pipe for genai-perf and make stress time as public parameter.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update llmRequest based on comment.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* launch process function refactor.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-06 16:52:30 +08:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

* fix test

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

---------

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Iman Tabrizian
85867d76dd
test: Add disaggregated serving accuracy tests (#4036)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-05 08:56:59 -07:00
Yanchao Lu
5ee38ad92a
[Test]: Clean up stale waives (#4062)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-05 22:13:12 +08:00
Yanchao Lu
ddfb0fe4e2
[Test]: Waive unsupported tests (#4059)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-05 20:51:49 +08:00
Yiqing Yan
b5c2327aa0
Waive L0 tests (#4051)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-05 12:53:21 +08:00
Yukun He
aa38e28cfa
fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988)
* Fix AllReduce kernel hang issue when both tp and pp are enabled.
Allocate one workspace for each pp rank to avoid potential race.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* update waive list

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

---------

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-05 11:33:25 +08:00
Yan Chunwei
bc0cf41592
chore: refactor llmapi e2e tests (#3803)
* refactor llmapi e2e tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-05 07:37:24 +08:00
Emma Qiao
2692daad2e
infra: Remove the WAR for test items incompletely (#3313)
* Remove the WAR for test items incompleted

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test item manually

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test definition file

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix some other test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update name for waived case name, too

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix name for multi-gpu tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix other qa tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix tests name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Correct test names in waive.txt

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add new test_durations file

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix names after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Update test duration to latest

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-04 11:31:59 +08:00
Mike Iovine
906cddffb0
[infra] Improve llama4 parallelism test coverage (#3821)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-02 16:15:04 -04:00
bhsueh_NV
561ee44737
add ci and doc for qwen3 (#4022)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-02 14:13:38 +08:00
xinhe-nv
009d5e9fa3
test: [CI] Add failed cases into waives.txt (#3943)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* waive test_llm_commandr_v01_single_gpu_summary for GH200

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-01 23:43:11 +08:00
nv-guomingz
dc344b6a4f
fix:https://nvbugs/5246733 (#3989)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-05-01 22:52:31 +08:00
YueWeng
b1621e8d4e
feat: add relaxed acceptance for DS (#3865)
* add relaxed acceptance for DS R1

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* clean and update docs

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* fix

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* Modified based on review

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* fix mtp manager issue

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

---------

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-01 21:50:36 +08:00
Chuang Zhu
1ada3c9800
unwaive disagg tests (#3925)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-30 16:44:00 +08:00
xinhe-nv
a31afcf3a9
update waive list (#3890)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-30 11:07:48 +08:00
Dom Brown
8709fe8b53
chore: bump version to 0.19.0 (#3598) (#3841)
test: add test cases for 0.19 release (#3608)

* fix test name



* add quickstart test for nemotron-ultra



* add rcca multi-node test case for deepseek-v3



* add rcca info



---------




squash (#3642)



fix: nvbugs/5187237: fix deterministic mode crash (#3448)

* nvbugs/5187237 nvbugs/5112075: fix deterministic mode error

* remove waive


* Revert "remove waive"

This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac.



* revert ar fusion



---------



update fp8 doc (#3647)




tests: change qa perf test to trtllm-bench (#3619)




 fix: FP8 quantized lm_head (NvBug 5214229) (#3567)



infra: Add PR approval protection for the release branch (#3634)



fix: nvbugs/5231298: pytorch allreduce issue (#3673)



Fix: nvbugs/5222698 variable not defined (#3630)

* Fix: nvbugs/5222698 variable not defined



* Tidy code



---------



test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685)



test:restore fp8 kv cache testing for L0 (#3671)



doc: Update DeepSeek perf docs (#3693)

* Update DeepSeek perf docs



* update



* Apply suggestions from code review




---------




tests: waive test_llm_multi_node (#3664)



fix: update test_user_buffers_mm_add_prologue atol (#3711)



Fix: cherry-pick hmac encryption from main branch (#3635)

* security fix cherry-pick changes from main



* fix hmac in remote mpi session (#3649)



---------





Un-waive DS-V3-Lite tests. (#3621)



fix: FP8 kv accuracy (#3675)

* fix FP8 kv accuracy



* update doc



---------



Fix script options for engines. (#3622)



unwaive multi-node test (#3721)



chore : Split more tests out of gpt tests (#3524) (#3674)



doc:add torch examples link into torch backend documentation (#3749)




test: Get Eagle tests working (#3593) (#3722)




Waive L0 test (#3756)



waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656)





Update ds v3 parameters in stress test. (#3676)

waive gemma on L20 (#3766)



https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758)

Include Qwen2VLDecoderLayer in the smooth_qwen2_model function.



fix: PP4 fixes and cleanup (#3688)




remove benchmark test list (#3643)



skip disagg deepseek test if sm!=90 (#3720)



test: skip failed cases on B200 (#3710)

* add skip condition to tests



* fix error



---------



test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718)

* skip_pre_ada for fp8 cases



* update



* update after rebase



---------



add know issue to deepseek doc. (#3800)



Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761)




Waive L0 tests (#3826)



fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793)

* Reduce memory usage in fused moe op associated with AutoTuning.
* Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens.
* Add free_memory logic of workspace in min_latency_mode fused moe path.



* Fix fused_moe fallback issue. (#3652)

min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression.



---------



[doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797)




Fix pre-commit



Fix again



Address some review comments for the MI

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-29 16:57:22 +08:00
QI JUN
c381380ecc
increase H100 CI nodes for PyTorch only pipelines (#3927)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-29 10:58:43 +08:00
Jinyang Yuan
dafc28fb85
fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863) 2025-04-29 09:09:43 +08:00
xiweny
f84dd8f815
test: add deepseek v3 & r1 cases (#3528)
* test: add deepseek v3 & r1 cases

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-04-28 23:37:26 +08:00
xinhe-nv
82a8e43557
test: [CI] Add failed cases into waives.txt (#3867)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-28 14:32:48 +08:00
xinhe-nv
e20b67e9fd
update waives & tests (#3887)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-28 14:29:35 +08:00
Yanchao Lu
068c72ebf8
Test: waive intermittent test hang (#3894)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-04-28 08:53:20 +08:00
Iman Tabrizian
74cc9e26ff
infra: install Triton in the base image (#3759)
* infra: install Triton in the base image

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* install Triton from the base image

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* update base image

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Address review comments

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* update base image

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* waive test

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-28 07:36:30 +08:00
Dom Brown
7ff9fd345c
Test: Split C++ unit tests for CI granularity (#3868)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-25 13:30:58 -07:00
Yiqing Yan
238fefc659
[infra] Waive L0 tests (#3853)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-25 17:32:21 +08:00
QI JUN
991939a0f4
chore: increase A30 for cpp test (#3811)
* increase A30 for cpp test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* enable parallel run test for gpt_executor

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* decrease freeGpuMemoryFraction of cpp tests

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:34:39 -07:00
xinhe-nv
476d7003f8
test: [CI] Add failed cases into waives.txt (#3777)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives.txt

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-24 09:36:05 +08:00
Zhanrui Sun
bfc4e55ded
infra: [TRTLLM-4417]Support auto trigger special test stage for special file change (#3478)
* infra: Support auto trigger special test stage for special file change

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Fix review

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Fix review

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

---------

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-23 20:32:19 +08:00
Enwei Zhu
8f2b2eaf83
test: Add DeepSeek-V3-Lite GSM8K tests (#3771)
* tmp

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update waives

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-23 16:54:48 +08:00
xinhe-nv
b82d72bc37
update waive list (#3696)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 14:18:57 +08:00
Yechan Kim
11d35656bf
fix: nvbugs/5234029 fix Qwen2.5-VL image test (#3726)
* fix: nvbugs/5234029 fix Qwen2.5-VL image test case by adding more answer candidate

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove qwen2.5_vl from waive list

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-04-23 14:09:39 +08:00
xinhe-nv
80d8fdefd6
add test_mistral_large_hidden_vocab_size tests (#3716)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 13:40:11 +08:00
Yiqing Yan
cc161dd83d
Waive L0 tests (#3784)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-23 11:22:11 +08:00
QI JUN
257abfbc51
move pytorch tests of LLM API into separate test files (#3745)
* move pytorch tests of LLM API into separate test files

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* polish

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* update

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-22 14:36:59 -07:00
Emma Qiao
442386d302
infra: Add test stages for sm120 (#3533)
* Add test stages for sm120

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update chip name and config name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Split tests to gb202 and gb203

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Don't flash driver for rtx-5090

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Skip the failed cases

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Change the test stage names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing

Signed-off-by: qqiao <qqiao@nvidia.com>

* Skip failed case on gb202

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix condition to dynamic driver flashing

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-04-23 01:26:12 +08:00
Ivy Zhang
47d2f16bb8
waive gemma on L20 (#3767)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-04-22 17:52:49 +08:00
ruodil
9223000765
waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657)
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-22 14:51:45 +08:00
xinhe-nv
ba216341f4
update waive list (#3683)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-22 11:09:41 +08:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483)
* add gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add gpqa

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* conditional import lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* gpqa in lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* system prompt

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* shuffle

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* revert AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* integration to tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add DS-R1

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix and clean

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* clean up

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* free_gpu_memory_fraction=0.8

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
Barry Kang
d87b009d8d
Fix ModelOpt Mixtral AWQ OOM (#3714)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-04-21 19:14:14 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095)
* Fix hang bug when KV cache is low

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Review comments

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Fix attentiondp typo

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add CI test for this case

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* fix: Fix the insertion order for responder futures

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* fix: Fix disagg CPP

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Stanley Sun
852dd0c1be
test: add llama3.2 ptp test case (#3363)
* add llama3.2 ptp test case

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* update test list

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

---------

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-04-21 15:15:45 +08:00
Yiqing Yan
6f7f262779
Waive L0 tests (#3709)
* Waive L0 tests

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* the test is fixed in PR 3711

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-21 11:24:00 +08:00
Emma Qiao
48db263d9a
infra: Add test list name check (#3097)
* Add steps to check test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct test-db command

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Switch to use a trt-llm image

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update go path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct go path

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move the test list check to test ci

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct file path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix path again

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix get path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Skip test list check for ARM

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix expression

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Change back unrelated file

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct qa test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Remove a stage

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update jenkins/L0_Test.groovy

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move some steps to a python script

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix script path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Split commands and debug

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Also correct case name in waives list

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move check script to another folder

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update qa list after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Remove the perf tests under QA

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Some tests already fixed after rebase to TOT

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-04-20 23:02:16 +08:00
brb-nv
c35d2a7532
test: Get Eagle tests working (#3593)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-20 00:50:57 +08:00
nv-guomingz
e70961f541
test:update waives.txt for nvbug 5219532 (#3672)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-19 18:57:39 +08:00
Iman Tabrizian
61ee983488
fix: Fix disaggregated load balance test (#3689)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-19 10:40:40 +08:00
Iman Tabrizian
a2f190f306
chore: Waive disaggregated load balance (#3687)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-18 16:04:33 -07:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support (#3590)
* feat: trtllm-serve multimodal support

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable argument

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add and separate tests and move the doc

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove block_resue arg from serve.py

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class (#3584)
* Add draft scheduler class

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor the design

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* feat: Introduce router class for disaggregated server

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Add unit tests for router class

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Adding tests for disagg_utils

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing missing import

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing disagg integration tests

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Addressing MR review comments

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
QI JUN
b9fce42717
enable test_ptp_quickstart_advanced_mixed_precision (#3667)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-18 05:06:24 -07:00
Zheng Duan
bce7ea8c38
test: add kv cache event tests for disagg workers (#3602) 2025-04-18 18:30:19 +08:00
peaceh-nv
88cff61fa1
chore : Split more tests out of gpt tests (#3524)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-04-18 12:04:57 +08:00
dongfengy
b71a0f76b4
test: Add llama 4 to ci (#3520)
* Add llama 4 to ci

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

* Only test trtllm

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

* Disable marverick

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

---------

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-04-18 11:25:52 +08:00
Ivy Zhang
ad19ca3cbf
remove benchmark test list (#3644)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-17 16:23:41 +08:00
Netanel Haber
3c52ac098f
feat: allocate minimal blocks per window size (#3028)
* implement variable window attention by breaking the block manager into window block managers per window size

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* revert isCyclic to be true if the min attention window is reached, not per window size

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* add explanatory comment to mCyclicThreshold

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* load correct gemma config

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* if TYPE_CHECKING

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* set temp_attention_window_inputs to None explicitly

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* set temp_attention_window_inputs to None explicitly

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* pass dtype as well

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* test_gemma variable sliding window attention

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* remove || mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* turn off request delaying for MaxUtil

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* make comments better

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* windowSizesTotalSum using std::accumulate

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix comments

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* remove assert that kills disagg tests, since it isn't necessary

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* add Gemma3 to SUPPORTED_HF_ARCHITECTURES

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* support Gemma3

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix kvfactor field for deepseek

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix comment

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix gemma-3 entries in testlist to include vswa

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* only quantize gemma2 VSWA

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

remove misleading comment

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix: disable KV cache reuse if using attention sink (#3021)

* fix: disable KV cache reuse if using attention sink

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: disable KV cache reuse if sink bubble

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* add comment

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-04-17 16:04:57 +08:00
Yiqing Yan
1c6f3debbb
Waive L0 tests (#3651)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-17 15:13:56 +08:00
xinhe-nv
b82a4e8d01
test: [CI] Add failed cases into waives.txt (#3627)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* fix waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-17 14:45:41 +08:00
Ivy Zhang
b2fb0fe843
test: add quickstart test for nemotron-ultra (#3596)
* add quickstart test for nemotron-ultra

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix test name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-17 11:16:41 +08:00
ruodil
5e2ebebe76
tests: change qa perf test to trtllm-bench (#3189)
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-17 09:53:32 +08:00
QI JUN
ab29348db2
waive test_llm_phi_quantization_1gpu (#3603)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-16 13:33:46 +08:00
Daniel Cámpora
41ce5440fe
chore: Mass integration of release/0.18 (#3421)
* [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release

Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
(cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265)

* [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
(cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1)

* [None][Doc] - Update docs for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88)

* [Infra] - Fix or WAR issues in the package sanity check stages

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe)

* [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path

Signed-off-by: Yuki Huang <yukih@nvidia.com>
(cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a)

* cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue

Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
(cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8)

* Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'"

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f)

* [Infra]Restrict setuptools version to avoid sasb pip install issue

Signed-off-by: Emma Qiao <qqiao@nvidia.com>
(cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c)

* [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path

Signed-off-by: Yuki Huang <yukih@nvidia.com>
(cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac)

* WAR for bug 5173448

Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com>
(cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f)

* [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
(cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9)

* [Docs] - Doc changes for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4)

* [Doc] - Doc change for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235)

* [Infra] update version to 0.18.1

Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
(cherry picked from commit 59e8326c75639275837d34de8e140358737a3365)

* Add back nemotron file.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Fix recurrentgemma reqs.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Adding WAR for bug 5173448.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Formatting.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Remove duplicated file.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update examples/prompt_lookup/requirements.txt

Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Remove glm-4-9b from model dir in chatglm test.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Remove indent change.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Revert changes on l0_test.groovy.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update dev images

Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

* Remove duplicated import.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Fix custom op

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Fix flashinfer & vanilla backend

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Skip problematic case.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-16 10:03:29 +08:00
xiweny
da47d5f27e
fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 (#3585)
* fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

* remove waiver

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

---------

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-04-16 08:31:33 +08:00
HuiGao-NV
d35db254e2
test: Enable 4 multi-gpu test cases for deepseek (#3569)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao†<huig@nvidia.com>
2025-04-15 22:01:52 +08:00
Yan Chunwei
c27e130be0
unwaive test (#3559)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-15 19:42:06 +08:00
xinhe-nv
5cfa927132
update waive list (#3503)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-15 16:53:53 +08:00
xinhe-nv
0e152910f5
update waive list (#3498)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-15 14:33:49 +08:00
Zheng Duan
b0cb963199
test: torch-flow conditional disagg test (#3410)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-04-15 10:54:14 +08:00
nv-guomingz
b32ae7ac92
test:add fp8_kv_cache functionality test case. (#3457)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-15 09:16:46 +08:00
Iman Tabrizian
bad55e99bb
test: Add MTP + overlap + Attention DP disaggregated test (#3542)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-15 07:46:03 +08:00
Ivy Zhang
170bc22139
fix test name (#3534)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-04-14 17:09:50 +08:00
xinhe-nv
b1d8495b3d
update waive list (#3510)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-14 15:24:48 +08:00
bhsueh_NV
9d7d48faeb
fix: disable the kv cache reuse for prompt tuning test (#3474)
* disable the kv cache reuse for prompt tuning test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* unwaive the wavied tests

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-14 14:35:47 +08:00
brb-nv
44090a5388
Add support for Phi-4-MM (#3296)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-14 14:24:10 +08:00
Yiqing Yan
19d296b4b2
chore: add dgx_h200 tests (#3451)
* add dgx_h200 tests

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix pre-commit

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* change bsl branch

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* change multi gpu related file list

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-14 11:20:55 +08:00
Yiqing Yan
65d1591fbf
Waive L0 test (#3508)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-14 09:32:01 +08:00
Chuang Zhu
6ee021a90d
chore: exchange connection id with tagSend/tagRecv (#3320)
* exchange connection id with tagSend/tagRecv

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* unwaive

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* tag recv/send

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-14 09:30:34 +08:00
dominicshanshan
5d3180be82
feat: Add stress test for TRT-LLM (#3250)
Signed-off-by: Wangshanshan <dominicw@nvidia.com>
2025-04-13 10:24:25 +08:00
pcastonguay
145a126a28
chore: Unwaive DS + overlap disagg test (#3339)
* chore: Unwaive DS + overlap disagg test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-12 13:33:38 -04:00
Iman Tabrizian
3041bbdab3
fix: Fix disagg MTP with overlap (#3406)
* fix: disagg overlap with MTP

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Review comment

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-12 12:27:24 +08:00
HuiGao-NV
c51e90d7d7
fix: don't perform memory estimation for start_attention (#3485)
* fix: don't perform memory estimation for start_attention

* Enable tests of unittest/_torch/multi_gpu

Signed-off-by: Hui Gao <huig@nvidia.com>
2025-04-12 11:34:46 +08:00
Enwei Zhu
cf9ceea890
test: Add DeepSeek-V3-Lite PP=4 cases (#3454)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-12 00:09:12 +08:00
Ivy Zhang
d998832b33
test: add torch flow test case in qa test list (#3404)
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
2025-04-11 16:57:41 +08:00
Yiqing Yan
0d351317c2
Waive failure post-merge tests (#3472)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-11 16:23:07 +08:00
Enwei Zhu
410f56357e
test: Waive torch compile tests (#3471)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-11 13:38:05 +08:00
QI JUN
16ca45747b
always trigger multi gpu test to protect modeling_llama.py and modeling_deepseekv3.py (#3434)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-11 13:19:23 +08:00
QI JUN
1e2a339642
waive unittest/_torch/multi_gpu (#3464)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-11 09:59:16 +08:00
QI JUN
6cef10068a
waive a test case of llama 3.1 with torch compile (#3461)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-11 09:15:19 +08:00
Iman Tabrizian
d7f45e50c6
test: disable attention DP tests for single GPU (#3395)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-11 01:38:17 +08:00
amitz-nv
a6a2ae6cc1
chore: Rename nvsmall to nemotron nas (#3447)
* Rename nvsmall to nemotron NAS

* Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests

* Add NemotronNAS to pytorch supported models table

Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-04-10 23:16:52 +08:00
wm2012011492
af05749e90
feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… (#3369)
* add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa_llmapi.py

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

* fix coding style

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

* add unittest

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

---------

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
Co-authored-by: mengw <12670782+wm2012011492@users.noreply.github.com>
2025-04-10 22:45:57 +08:00
QI JUN
f5281fffaa
waive some test cases of test_llm_multi_gpu.py (#3452)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-10 22:02:35 +08:00
Yiqing Yan
10d2d16247
Waive L0 test (#3442)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-10 17:43:45 +08:00
bhsueh_NV
cec65bd09a
clean the waive.txt (#3441)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-10 16:20:08 +08:00
brb-nv
c59abae436
feat: Add Gemma3 text-only model support (#3247)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-10 12:34:58 +08:00
peaceh-nv
215fb20567
chore : split GptExecutor tests out of gpt tests to reduce single test time (#3412)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-10 09:08:15 +08:00
Yechan Kim
943218b54a
feat: Add Qwen2.5-VL and refactor Qwen2-VL (#3156)
* feat: Add Qwen2.5-VL and refactor Qwen2-VL

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix yapf and codespell

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix test_e2e

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* generalize get_rope_index

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix qwen2.5-vl on REAME

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix image test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-10 04:09:03 +08:00
Iman Tabrizian
8401722245
test: Add single gpu disaggregated tests (#3295)
* test: Add single gpu disaggregated tests

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add deepseek with overlap tests

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Use updated prompt

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Move test to disaggregated folder

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-09 09:34:45 +08:00
pcastonguay
02f446a9ff
chore: Adding DS V3-lite tests with overlap + cuda graph (#3342)
* chore: Adding DS V3-lite tests with overlap + cuda graph

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-08 09:36:09 -04:00
Chuang Zhu
cdb0906be4
disagg test single h100 (#3353) 2025-04-08 17:45:35 +08:00
amirkl94
e04f6a1b9b
fix: Fix p-tuning test bug (#3326)
* fix: Fix p-tuning test bug

* A change in the vocab_size calculation for T5Tokenizer,
introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning.
In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added.

Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-08 17:14:00 +08:00
Enwei Zhu
8ee019f8c4
test: Accuracy test improvement (Part 3.4): Move LLaMA tests (#3350)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 15:07:57 +08:00
MinaHuai
31422e7e46
add tp=2 ci test for vision encoder (#3319)
Signed-off-by: mhuai <mhuai@nvidia.com>
2025-04-07 21:46:08 -07:00
Enwei Zhu
ba019a43d6
test: Accuracy test improvement (Part 3.3): Move DeepSeek tests (#3260)
add skip



fix



fix



update



update test list



fixqa list



move bf16 to postmerge

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 07:19:04 +08:00
YueWeng
aab6214801
test: fix conflicting test names (#3316)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-04-07 20:10:01 +08:00
QI JUN
a2fad51011
chore: waive a timeout multi-GPU test case (#3310)
* debug CI timeout issue

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* waive timeout case

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-07 14:04:54 +08:00
brb-nv
017361c26c
test: Waive non-Llama Eagle tests (#3309) 2025-04-07 09:25:41 +08:00
qixiang-99
0d4d50a745
feat: no-cache attention in PyTorch workflow (#3085)
* init trtllm attn no cache

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* fix: fix the seq_len issue and attn metadata prepare for qwen reward model test

fix: fix minor bugs after rebase
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: remove unnecessary debug logs and clean up commented code

refactor: update max_seq_len documentation and remove max_seq_len for decoder model contructor in PyTorchModelEngine
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: update calculate_ref_result function to accept tensor inputs and mask type, enhance test_attention_no_cache to support FULL and CAUSAL masks

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: remove unused BERT attention metadata conversion method and add type assertion for no cache attention in PyTorchModelEngine

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: remove use_kv_cache parameter from attention function and related classes, update documentation for KV cache handling

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: implement setAttentionMaskType method for better mask type handling and remove unused conversion function

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: streamline KV cache handling by replacing direct member access with useKVCache method and simplify token per block assignment

remove Debug code.

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: Resolve comments for Python code

Simplify no cache attention metadata preparation and streamline related attributes in TrtllmAttentionMetadata

Removed the private method for converting to no cache attention metadata and integrated its logic into the prepare method. Updated the test for BERT sequence classification to reflect these changes and ensure proper handling of attention metadata.

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* docs: Add is_dummy_attention field to attention metadata for simulation operations

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* refactor: add KVCacheParams to attention backend interface and import relevant metadata classes

Updated the attention backend interface to include KVCacheParams and imported TrtllmAttentionMetadata and VanillaAttentionMetadata in model_engine.py for enhanced functionality.

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* fix: fix rebase format issue

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* fix: extend attention mask type handling in MHARunnerFixedParams

Added support for additional attention mask types (BIDIRECTIONAL, BIDIRECTIONALGLM, BLOCKSPARSE) in the MHARunnerFixedParams structure to fix the mapping issue between ContextAttentionMaskType and AttentionMaskType

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

* fix: enhance attention mask type handling in TllmGenFmhaRunnerParams

Updated the setAttentionMaskType method to include a switch-case structure for better handling of attention mask types, ensuring proper mapping and error handling for invalid types.

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>

---------

Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
2025-04-05 01:54:32 +08:00
QI JUN
059a34468c
fix deepseek multi gpu tests timeout (#3285)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-04 16:19:02 +08:00
yuanjings-nvda
5776b99b70
fix vila test (#3042)
Signed-off-by: Yuanjing Shi <yuanjings@nvidia.com>
2025-04-04 14:30:06 +08:00
Pengyun Lin
f25c7cefb4
doc: refactor trtllm-serve examples and doc (#3187)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-04 11:40:43 +08:00
Tracin
bb6c338730
AWQ support Modelopt ckpts. (#3258)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-04 08:10:35 +08:00
xinhe-nv
2005e5aaaf
remove tests from qa test lists (#3256)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-03 16:06:39 +08:00
Zhanrui Sun
7f03125098
test: [TRTLLM-3994] Support only run pytorch tests (#3013)
* [TRTLLM-3994] Support only run pytorch tests

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Move perf test to TensorRT backend

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Fix review

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

---------

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-03 13:46:09 +08:00
Jinyang Yuan
2fdfa39ea8
fix: Fix an error related to dummy request when MTP is used (#3146) 2025-04-03 11:08:12 +08:00
Chuang Zhu
f5bf74bc7f
enable some disagg test (#3203)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-03 06:10:48 +08:00
Enwei Zhu
3cf7066350
test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219)
* remove test_llm_models_multi_gpu.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* qwen 2.5

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* upgrade

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-02 17:29:57 +08:00
Yiqing Yan
c19b7f7c2a
waive L0 test (#3217)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-02 11:16:22 +08:00
Chuang Zhu
bc5811da65
chore: Ucx ip port remove mpi depend (#3101)
* initial ucx support

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* fixes to support dynloading and ucx connection establishment - not stable yet

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* update

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* more connection bringup fixes - faillig on connection vector build

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* executor test pass

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* update

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* passed full benchmark

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* changing to TLLM_THROW and removing cout

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* stoping progress thread at ucxComm destructor

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* fix copyrights

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* adding ucx flavor to cache transceiver test and insertto the CI pipeline

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* allowing sending non ib interfaces IPs

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* setting UCX port reuse for the tests in pipeline

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* code review fixes

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* querying ep after GID message is sent to avoid UCX Errors

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* fixing more CR issues

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* querying ep to not fail is ep_not_connected yet

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>

* remove mpi dependency and debug

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* debug to info

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* mpirun n 2

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* remove mpi comm split when disaggOrchestrator mode

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* waive disagg_mtp test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* use future instead of thread

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* use future_promise instead of cv wait

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* connectionId type

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* improve test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* imporve test 2

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* gtest_skip

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-04-02 09:42:29 +08:00
brb-nv
1fe3e30356
Add support for Phi-4-mini (#2990)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-02 08:34:39 +08:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval (#3167)
* add eval_llmapi

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

tmp commit

port to CLI tool

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

setup llmapi

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix spec_dec_algo

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

_update_from_hf_quant_config

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

migrate test_pytorch.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix fp8 block scales

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix fp8 rowwise

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

adj alpha

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move test_pytorch.py cases

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

rename test_accuracy.py to test_cli.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

clean

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix cnn_dailymail

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* renaming to cli flow

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* rename MMLU

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* rename

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add error

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
amirkl94
bf02b9144f
feature: Add LoRA support for gemma (#3068)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-01 19:15:55 +08:00
brb-nv
727d78e785
Support prequantized fp8 ckpt for nemotron-mini-4b-instruct (#3046)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 14:52:09 +08:00
brb-nv
1901bfcf76
test: Add Eagle tests with untrained heads (#2991)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 11:41:59 +08:00
Frank
8bb3eea285
perf: Readd iteration logging for trtllm-bench. (#3039)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-01 08:13:09 +08:00
Iman Tabrizian
e8731ba3b7
fix: disable cuda graph and MTP for overlap tests (#3155)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-03-31 11:35:35 -07:00
bhsueh_NV
322ac565fc
chore: clean some ci of qa test (#3083)
* move some models to examples/models/contrib

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* update the document

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove arctic, blip2, cogvlm, dbrx from qa test list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove tests of dit, mmdit and stdit from qa test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove grok, jais, sdxl, skywork, smaug from qa test list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* re-organize the glm examples

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix issues after running pre-commit

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix some typo in glm_4_9b readme

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
xinhe-nv
86f3b59f81
update waive list (#3094)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: larryx <larryx@nvidia.com>
2025-03-31 11:42:45 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation (#3035)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00
Fanrong Li
644a01cbbe
test: Add gpqa tests for DeepSeek models (#3063)
* Add gpqa accuracy test script
* Add gpqa accuracy tests
* Update DeepSeek-v3 doc
* Update qa test list

---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-03-27 19:47:06 +08:00
xiweny
6979afa6f2
test: reorganize tests folder hierarchy (#2996)
1. move TRT path tests to 'trt' folder
2. optimize some import usage
2025-03-27 12:07:53 +08:00
Yan Chunwei
82edd90350
fix gpus_per_node in trtllm-bench when world_size < device_count (#3007)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-27 09:31:40 +08:00
Dom Brown
60d4dacc47
Port multi GPU changes to GitHub (#3027)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-27 05:55:03 +08:00
Yechan Kim
3c7cb6629c
Add EXAONE-Deep (#3054)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-03-26 14:24:04 +08:00
kxdc
e6cb34d921
test: fix QA TRT integration testlist mismatch issue (#3090)
Incorrect test-db context caused empty test list output.
Fix by typo correction: `llm_trt_*` -> `trt_llm_*`.

Signed-off-by: kxdc <xink@nvidia.com>
2025-03-26 14:03:21 +08:00
Anurag Mukkara
7361c7d401
Add second possible output (#3043)
Signed-off-by: Anurag Mukkara <amukkara@nvidia.com>
2025-03-25 12:59:27 -07:00
Enwei Zhu
f93ac9672e
clean (#3061)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-03-25 21:55:08 +08:00
Chuang Zhu
110c6fc0f0
wait long time for disagg test (#2998)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-03-25 20:52:38 +08:00
QI JUN
a8ec1cc4ea
remove examples/test_gptj.py::test_llm_gptj_fp8_manage_weights_summary test case (#3057)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-25 15:41:27 +08:00
Yan Chunwei
69feafc947
fix: amend the test list (#3056)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 14:17:36 +08:00
Yan Chunwei
c29cebf79d
Deprecate model_api examples (#2999)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 09:37:20 +08:00
Enwei Zhu
705eef68c2
test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982)
* Accuracy test improvement (Part 2)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* WAR OOM

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

update

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-25 07:34:10 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… (#2987)
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

* updatet test case per review comments

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

---------

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
Kaiyu Xie
2631f21089
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00