TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-09 04:31:49 +08:00

Author	SHA1	Message	Date
QI JUN	656f229b58	[None][ci] move some test cases from l40s to a30 (#7684 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-11 07:22:34 +08:00
Emma Qiao	9986070044	[None][infra] Waive failed cases on main 0910 (#7676 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-11 01:43:29 +08:00
Dom Brown	fc9d426589	[https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-10 18:30:48 +01:00
nvamyt	222e01662c	[https://nvbugs/5488212 ][waive] Waive failed tests for L20 (#7664 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-10 22:32:15 +08:00
xinhe-nv	207c5258c4	[https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell (#7505 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-10 21:09:27 +08:00
Bo Deng	bf57829acf	[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-10 17:35:51 +08:00
fredricz-20070104	ef620f3579	[https://nvbugs/5410687 ][test] Add deepseek r1-w4afp8 quickstart (#7645 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-10 10:21:01 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
QI JUN	a0e1604898	[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-09 11:06:32 -04:00
Liao Lanyu	af403848d7	[https://nvbugs/5445466 ][fix] unwaive DS R1 test cases with bug already fixed (#7429 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-09 17:25:49 +08:00
Perkz Zheng	da6cb541a2	[None][feat] Optimize MLA kernels with separate reduction kernels (#7597 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-09 16:58:44 +08:00
xinhe-nv	8a52015f50	[None][chore] Remove closed bugs (#7591 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-09 04:08:42 -04:00
Yiqing Yan	5c616da2fd	[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-09 11:33:09 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
Iman Tabrizian	d96c54d8ae	[None][test] Skip eagle3 test (#7627 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-08 17:23:53 -04:00
dongfengy	fdd5bd49fc	[https://nvbugs/5481080 ][fix] Fix GPTOSS W4A16 reference (#7323 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-08 13:59:28 -07:00
Chuang Zhu	77657a1c12	[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-08 13:37:46 -04:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00
Raayan Dhar	bae9560e62	[https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks (#7455 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-07 08:45:49 -04:00
dominicshanshan	9a97f0a3b7	[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 (#7585 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-06 21:29:16 +08:00
QI JUN	525bb806a9	[None][ci] move some test cases of DGX H100 to post merge (#7569 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-06 01:03:38 -04:00
Emma Qiao	d8ec546b73	[None][infra] Waive failed tests on main branch 0905 (#7564 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-05 22:46:46 +08:00
xinhe-nv	8e3962d278	[TRTLLM-6642][feat] add gptoss 20g tests (#7361 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:20:28 -04:00
xinhe-nv	b3ba3d98d2	[None][chore] Remove closed bugs (#7408 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:11:16 -04:00
QI JUN	ff3704897b	[None][ci] remove unnecessary test_modeling_deepseek.py (#7542 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-04 20:05:27 -07:00
Jin Li	2189a2f3ff	[https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… (#7441 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-05 10:56:21 +08:00
Ivy Zhang	b46e0ae5d4	[None][test] update nim and full test list (#7468 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-04 09:06:01 -04:00
Jin Li	2a2dfe273b	[https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… (#7442 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 10:48:15 +08:00
Stanley Sun	db8eb0a447	[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-04 10:34:38 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Stanley Sun	cebbf48b74	[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-03 08:36:52 -04:00
Mike Iovine	79d93f9419	[https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 (#7486 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-03 14:10:40 +08:00
Wanli Jiang	4223a9aada	[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-03 10:27:42 +08:00
Simeng Liu	bcc55bcdf3	[https://nvbugs/5470782 ][fix] Add specific test names for test_deepseek.py (#7318 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-09-02 10:31:40 -07:00
Emma Qiao	aae5d22bfe	[None][infra] Waive failed tests on main branch 0902 (#7482 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-02 10:16:49 -04:00
peaceh-nv	90479c50fb	[https://nvbugs/5453992 ][unwaive] Unwaive llama quickstart test (#7242 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-02 20:28:32 +08:00
JunyiXu-nv	eefe5f2093	[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-09-02 07:08:22 -04:00
HuiGao-NV	7279297717	[None][infra] waive test case failed on post-merge (#7471 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-02 06:20:08 -04:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
Yan Chunwei	f90375f37c	[https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus (#7454 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-02 04:17:14 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Yiqing Yan	21291f3d8e	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	d5bc5cd4f2	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	ac07418968	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	3e99744201	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	7841ea6255	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
QI JUN	baef70e67e	[None][ci] move qwen3 tests from b200 to gb200 (#7257 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-26 11:50:53 -04:00
xinhe-nv	80043affb5	[None][chore] Add failed cases into waives.txt (#7251 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 17:13:44 +08:00
Zheng Duan	cf50ba2980	[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 15:34:44 +08:00
Zheng Duan	1a929a1490	[https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests (#7028 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 14:25:10 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
William Zhang	92576488d3	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-25 23:56:21 -04:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00
Emma Qiao	200db3b809	[None][infra] Waive failed tests on main branch (#7201 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-25 09:04:37 -04:00
Ivy Zhang	f61b74f796	[None][test] add l20 specific qa test list (#7067 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-25 12:44:08 +08:00
Bo Deng	c038fb3ef4	[None][chore] cherry-pick 6940 (#7097 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-25 10:28:45 +08:00
xinhe-nv	3ba9afcc7b	[None][feat] add gpt-osss tests to sanity list (#7158 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-25 10:22:07 +08:00
Yiqing Yan	486bc763c3	[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:09:04 -04:00
Robin Kobus	31979aefac	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-24 20:53:17 +02:00
ajrasane	068056677f	[None][chore] Enable auto deploy accuracy test in CI (#7179 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-24 08:42:30 -07:00
Yanchao Lu	ec35481b0a	[None][infra] Prepare for single GPU GB200 test pipeline (#7073 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:46:39 +08:00
dongxuy04	19a0ea363b	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: Dongxu Yang <dongxuy@nvidia.com> Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-08-24 08:15:29 -04:00
Iman Tabrizian	96ff82e77a	[None][fix] Waive test (#7185 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-24 10:45:11 +08:00
QI JUN	1388e84793	[None][ci] move all B200 TensorRT test cases to post merge (#7165 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-22 06:47:23 -04:00
xinhe-nv	b8b2bd4a0a	[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 17:17:27 +08:00
Linda	898f37faa0	[None][feat] Enable nanobind as the default binding library (#6608 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-08-22 09:48:41 +02:00
xinhe-nv	4017f7cd6b	[None][chore] Add failed cases into waives.txt (#7109 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 10:39:25 +08:00
dominicshanshan	6f245ec78b	[None][chore] Mass integration of release/1.0 (#6864 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-22 09:25:15 +08:00
Emma Qiao	344bc4575d	[None][infra] Waive failed case for main branch (#7129 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-22 00:08:55 +08:00
Dimitrios Bariamis	f49dafe0da	[https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-21 18:08:38 +02:00
bhsueh_NV	ba0a86e0bb	[https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci (#7000 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 01:17:32 -04:00
xinhe-nv	21f4434404	[None][chore] waive failed cases on H100 (#7084 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-21 11:15:23 +08:00
Yechan Kim	0893afae3d	[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-21 08:54:12 +08:00
bhsueh_NV	73d2daa386	[https://nvbugs/5457489 ][fix] unwaive some tests (#6991 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 08:49:57 +08:00
QI JUN	a918de710a	[None][ci] move some tests of b200 to post merge (#7093 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-20 19:43:40 -04:00
Emma Qiao	f84dd64250	[None][infra] Waive failed tests on main branch 8/20 (#7092 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-20 06:33:44 -04:00
Robin Kobus	b95cab2a7c	[None][ci] move unittests to sub-directories (#6635 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-20 05:42:22 -04:00
xinhe-nv	9e71b4fda4	[TRTLLM-7205][feat] add llama4 tp4 tests (#6989 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-20 13:22:05 +08:00
Leslie Fang	3f6a9267f1	[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-20 13:14:34 +08:00
Bo Deng	30da5d3cc4	[None][chore] unwaive test_disaggregated_genbs1 (#6944 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 09:57:35 +08:00
Emma Qiao	8f95f35503	[None][infra] Waive failed tests on main (#7037 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 09:31:07 -04:00
Yiqing Yan	07506bccbe	[None][chore] Remove duplicate test waives (#7044 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-19 21:04:31 +08:00
Fanrong Li	655d0f48d0	[https://nvbugs/5455140 ][fix] unwaive DSR1-fp4 throughput_tp8 (#7022 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 20:48:05 +08:00
xinhe-nv	2c86cee38c	[None][chore] Remove closed bugs (#6969 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-19 16:01:33 +08:00
Ivy Zhang	bff5fdf6df	[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 13:59:14 +08:00
William Zhang	daa2a65d37	[https://nvbugs/5454875 ][ci] Unwaive Mistral Small 3.1 test (#7011 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-19 00:32:14 -04:00
fredricz-20070104	e90280a84d	[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-19 00:13:04 -04:00
Fanrong Li	816a120af6	[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 00:03:03 -04:00
Lizhi Zhou	71e28eab36	[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-19 09:58:22 +08:00
Leslie Fang	e76e5c640f	[None][infra] Enable accuracy test for mtp and chunked prefill (#6314 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-19 07:42:52 +08:00
Yiqing Yan	1ce23545fc	[None][chore] Remove duplicate test waives (#6998 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 21:15:49 +08:00
Emma Qiao	69ff32f9b1	[None][infra] Waive failed tests on main 0818 (#6992 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:34:52 +08:00
Shi Xiaowei	5ec15b98f0	[TRTLLM-7030][fix] uppercase def value in pd-config (#6981 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-18 02:33:23 -04:00
Leslie Fang	ce0b13ea02	[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-18 09:18:17 +08:00
Emma Qiao	cc6d763824	[None][infra]Waive failed cases in main branch (#6951 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-17 14:27:59 +03:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
brb-nv	9505727d31	[https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests (#6952 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-15 16:35:02 -07:00
yifeizhang-c	4127d77678	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-15 09:52:06 -07:00
liji-nv	18ccd053d3	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6858 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-15 11:14:20 -04:00
xinhe-nv	b23fdfc62f	[None][chore] Add failed cases into waives.txt (#6914 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-15 14:00:16 +08:00
Yanchao Lu	3a987891d8	[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-15 11:16:07 +08:00
Bo Li	26f413ad90	[https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case (#6882 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-14 17:46:54 -04:00
Emma Qiao	96339c69a9	[None][infra] Waive failed cases on main (#6902 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA	ffc976ceaf	[https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-14 22:36:56 +08:00
NVJiangShao	a700646132	[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-08-14 13:35:37 +08:00
Bo Deng	d8acca495b	[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-14 04:36:38 +00:00
jmydurant	4200fa46d1	[None][feat] Add support for Hopper MLA chunked prefill (#6655 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-14 10:39:26 +08:00
Mike Iovine	7cba883932	[https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test (#6833 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-13 17:38:45 -04:00
Emma Qiao	c7e6145409	[None][infra] Waive failed cases on main (#6863 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-13 09:50:14 -04:00
Anthony Chang	2198587b35	[https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend (#6200 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-08-13 21:24:40 +08:00
Yechan Kim	12102e2d48	[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-12 19:34:02 -07:00
Chang Liu	be9dd4713c	[https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-11 17:16:49 -07:00
Emma Qiao	5145e9d40e	[None][infra] Unwaive an updated case to test (#6791 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 06:47:33 -04:00
Emma Qiao	d6ad4a9d5b	[None][infra] Waive failed tests on main 0811 (#6778 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 03:16:25 -04:00
xinhe-nv	9c358c26e4	[None][chore] remove closed bugs (#6772 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-11 14:39:58 +08:00
Eran Geva	b3e8fa2960	[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu (#6487 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-08-11 08:33:13 +03:00
Tracin	49bcaa4e95	Add gpt-oss GSM8K test. (#6732 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-08-10 22:45:43 -04:00
Chuang Zhu	c566a8d2a2	[None][fix] fix same pp disagg (#6730 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-08-10 22:45:15 -04:00
Bo Deng	767879ef85	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-11 10:05:10 +08:00
Emma Qiao	ee19ca5e58	[None][infra] Waive test main 0808 (#6751 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-09 23:54:07 -04:00
Ye Zhang	bcf5ec0c9a	[None][feat] Core Metrics Implementation (#5785 ) Signed-off-by: Ye Zhang <zhysishu@gmail.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-09 02:48:53 -04:00
ruodil	b15d6fb145	[None][test] fix yml condition error under qa folder (#6734 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-08 15:59:01 +10:00
2ez4bz	064eb7a70f	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-08 01:50:36 -04:00
Enwei Zhu	aee828d98a	[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-08 12:10:36 +08:00
ruodil	22f45a0e19	[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 22:57:04 -04:00
xinhe-nv	88ced50ca7	[TRTQA-2920][fix] Add failed cases into waives.txt (#6719 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-08 12:54:13 +10:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
Raayan Dhar	4055b764db	[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>	2025-08-07 11:18:02 -04:00
pcastonguay	453a06e6ab	[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-07 14:17:07 +02:00
Enwei Zhu	1b9781e8e7	[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-07 05:53:48 -04:00
xinhe-nv	0a467b00cc	[https://nvbugs/5409414 ][fix] fix Not registered specs (#6660 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 17:55:53 +10:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
ruodil	6c1f7d8b91	[None][test] correct test-db context for perf yaml file (#6686 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 02:47:10 -04:00
YueWeng	157ea77549	[https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-08-07 10:25:17 +08:00
ruodil	780d7507f9	[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:02:13 +10:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
Iman Tabrizian	13ecb4aced	[https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests (#6644 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-06 09:08:29 -04:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Emma Qiao	78a75c2990	[None][Infra] - Split gb200 stages for each test (#6594 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-05 07:10:00 -04:00
xinhe-nv	c32584125e	[TRTQA-2920][fix] Add failed cases into waives.txt (#6600 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00
Ivy Zhang	08ed9d7305	[None][doc] add introduction doc on qa test (#6535 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 17:02:17 +08:00
Ivy Zhang	d101a6cebc	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 16:39:25 +08:00
Haohang Huang	c9eebcb454	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>	2025-08-05 07:47:41 +00:00
ruodil	7625845365	test: add README_release_test.md for perf test (#6443 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-05 02:07:42 -04:00
xinhe-nv	a178cea324	[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 12:47:53 +10:00
xinhe-nv	fe3d607c4b	[TRTQA-2920][fix] Add failed cases into waives.txt (#6581 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-05 12:41:23 +10:00
Ivy Zhang	f3651adea8	[None][test] update invalid test name (#6596 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 08:01:05 -04:00
Emma Qiao	5d8a5a0cb8	[None][Infra]Waive failed case in post-merge on main (#6602 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-04 19:39:44 +08:00
brb-nv	87e4e9f468	[None][chore] Add unit test for Gemma3 lora (#6560 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 04:56:57 -04:00
Pengyun Lin	a15e33351d	[None][fix] Revert commit `48ddc3d` & add test for disagg server with different max_num_tokens (#6259 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-04 15:09:51 +08:00
xinhe-nv	a54972e463	[None][fix] remove closed bugs (#6576 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:52:11 +10:00
Leslie Fang	a60190836c	[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-04 01:45:24 -04:00
ruodil	6459725bf9	test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:22:39 +10:00
Ivy Zhang	5eefdf2c75	tests: Add llama4 functional cases (#6392 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 11:19:58 +08:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
Ivy Zhang	7547a7d0a2	[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-03 22:11:26 -04:00
Jhao-Ting Chen	4da5cfc511	[None][infra] add eagle3 one model accuracy tests (#6264 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-02 16:07:46 -07:00
Lizhi Zhou	6f34f3489b	[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-01 13:33:34 -04:00
xinhe-nv	263c6c0ad0	test: skip post blackwell (#6357 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-01 13:10:14 -04:00
Emma Qiao	16febefee0	[None][Infra] - Skip failed tests in post-merge (#6558 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-01 22:21:23 +08:00
brb-nv	7447d6ed85	[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-01 09:19:54 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
Yukun He	90856bf97d	[https://nvbugs/5419069 ][fix] Fix the mismatched layer name components. (#6417 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-01 16:32:39 +08:00
brb-nv	2eca0d5925	fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-31 17:18:23 -07:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
Faraz	8e84df74b5	Fix e2e test failure for RTX6000 Pro (#6420 ) Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>	2025-07-30 23:32:44 -04:00
xinhe-nv	ca534e4798	test: add accuracy reference (#6479 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-31 12:27:29 +10:00
bhsueh_NV	ae3a5fc918	[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-31 09:37:23 +08:00
brb-nv	0e16d1f070	test: Add time logging for lora tests (#6466 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 14:02:43 -07:00
Anurag Mukkara	fac186e3b5	[nvbug/5409417] Unwaive llava test case (#6460 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-07-30 14:38:47 -04:00
brb-nv	f6287e4498	Unwaive Gemma2 LoRA test on H100 (#6461 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 12:56:12 -04:00
Bo Deng	24e7f4eece	[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-31 00:41:37 +08:00
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
pcastonguay	0f083b9daf	fix: Unwaive triton cpp test [nvbug 5401088] (#6412 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-30 11:25:18 -04:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
tomeras91	a2514d93fc	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-30 07:22:32 -04:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
xinhe-nv	c00d6763b2	test: [CI] Add failed cases into waives.txt (#6457 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-30 12:36:58 +10:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
xinhe-nv	f1086e7d4f	test: [CI] remove closed bugs (#6381 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-29 19:01:23 +10:00
xinhe-nv	4fbb344caf	test: [CI] Add failed cases into waives.txt (#6423 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-29 19:00:30 +10:00
Yukun He	0eee2e2850	[5385981] fix: Update the usage of VisionAttention init API. (#6413 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-29 16:41:48 +08:00
ruodil	e11255e9d0	test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-29 15:52:45 +10:00
Michal Guzek	2573bb729d	feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-28 14:02:14 -07:00
2ez4bz	cdca541148	[test] Unwaive mistral3.1 small E2E test (#6352 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 14:37:42 -04:00
2ez4bz	60e4d3a9d4	[test] Add accuracy regression test for Mistral3.1 (#6322 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 09:41:44 -07:00
ruodil	03632a679f	test: organize perf cases and add missing perflab cases in qa test list (#6283 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-28 20:33:32 +10:00
xinhe-nv	971be1fe86	test: waive failed cases (#6394 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-28 20:31:43 +10:00
Emma Qiao	b3ca159787	[Infa] - waive failed cases and fix a typo (#6384 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-28 02:06:57 -04:00
Chang Liu	dc757799e1	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
Yan Chunwei	908f49a4ad	[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 09:01:10 +08:00
nv-guomingz	b8d4cb8beb	feat: Support JSON Schema in OpenAI-Compatible API (#6321 ) Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>	2025-07-25 12:55:56 -04:00
xiaoqi	a0aecf0476	[feat]: support logit_bias (#5354 ) Signed-off-by: xq25478 <xq25478@qq.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-25 09:37:41 +00:00
xinhe-nv	470544cf17	test: [CI] Add failed cases into waives.txt (#6333 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-25 17:18:06 +10:00
xinhe-nv	6268a60ab3	tests: add test_chunked_prefill for llama4 (#5549 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 23:02:00 -04:00
bhsueh_NV	7b6aadc800	[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-24 21:47:37 +08:00
Emma Qiao	0cc1f8c03d	[Infra] - Wiave failed tests in post-merge (#6331 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 21:18:06 +08:00
Iman Tabrizian	5fceaa6153	Revert "tests: add timeout_manager to tensorrt flow test cases (#5942 )" (#6309 )	2025-07-23 23:58:10 -04:00
Iman Tabrizian	7740bfa31d	Waive tests (#6312 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 18:15:07 -07:00
Emma Qiao	cb737a5fcd	[Infra] - Skip failed cases (#6299 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-23 21:26:31 +08:00
xinhe-nv	2b0fa24175	test: [CI] Add failed cases into waives.txt (#6289 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-23 19:04:21 +10:00
YueWeng	ed62a06eef	[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue (#6136 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-07-23 14:53:37 +08:00
Iman Tabrizian	bc2fb29c5e	[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 05:27:16 +08:00
John Calderon	b7c8a672da	[Issue 6193] Fix gemma3vl weight loader (#6233 ) Signed-off-by: John Calderon <johncalesp@gmail.com>	2025-07-22 10:32:18 -07:00
Stanley Sun	04f2d4b2eb	test: update test list for RTX6KD (#6213 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-22 18:55:24 +08:00
Yi Zhang	eb7d0f84b5	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yan Chunwei	f194b65f3e	fix [nvbug/5351244]: address remote mpi session submit (#5664 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Ivy Zhang	eb5cb5b642	tests: add timeout_manager to tensorrt flow test cases (#5942 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-22 10:23:41 +08:00
Simeng Liu	4a0951f85c	[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-07-21 15:46:37 -07:00
Yi Zhang	f9b0a911fb	test: Enable GB200 torch compile multi gpu tests (#6145 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-21 22:17:13 +08:00
Emma Qiao	e41507a253	[Infra] - Waive failed cases on recent post-merge (#6212 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-21 21:00:18 +08:00
Linda	3efad2e58c	feat: nanobind bindings (#6185 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-21 08:56:57 +01:00
xinhe-nv	b46fd41026	test: [CI] remove closed bugs (#6201 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-21 15:40:30 +08:00
ruodil	6a3c9f8061	test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-21 11:29:19 +10:00
bhsueh_NV	2e14c8f443	[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-20 10:25:25 +08:00
Ziyi Xiong	66030ef815	[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133 ) Signed-off-by: ziyixiong-nv <fxiong@nvidia.com> Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-19 13:17:15 +08:00
wili	82d3587bb8	[refactor] Unify name of NGram speculative decoding (#5937 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-19 12:59:57 +08:00
xiaoqi	28858c8711	feat(eagle3):support qwen3 dense model (#5879 ) Signed-off-by: xq25478 <xq25478@qq.com>	2025-07-19 01:24:32 +08:00
Bo Deng	2c6fa145ee	[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-19 00:48:44 +08:00
Emma Qiao	77acb4f753	[Infra] - Waive failed tests in post-merge (#6176 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-18 17:34:34 +08:00
Zhenhuan Chen	992b273045	[https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-18 10:34:37 +08:00
Iman Tabrizian	b75e53ab69	Revert "feat: nanobind bindings (#5961 )" (#6160 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-18 10:12:54 +08:00
2ez4bz	8480c120b1	[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-17 11:04:17 -07:00
Linda	5bff317abf	feat: nanobind bindings (#5961 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-17 22:42:52 +08:00
Yi Zhang	a718486900	fix: Fix DeepSeek R1 CI (#6129 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-17 18:24:49 +08:00
Chuang Zhu	44c70c88f9	chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-07-17 17:42:07 +08:00
Iman Tabrizian	d4d21a106e	[fix] Release slots with spec decode + disagg (#5975 ) (#6032 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-17 12:58:18 +08:00
chenfeiz0326	fe070a0168	test: Update Llama4 Scout FP4 & FP8 accuracy tests (#5901 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-17 09:41:18 +08:00
Wanli Jiang	2d2b8bae32	feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-17 06:30:58 +08:00
qixiang-99	e09e409dfb	Fix: Enhance ModelConfig for kv cache size calculations (#5868 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-07-16 14:41:31 -07:00
Emma Qiao	e30d7bec38	[Infra] - Waive failed cases in post-merge on main (#6096 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-16 22:41:18 +08:00
Ivy Zhang	dda91b5117	tests: add QA test cases (#5959 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:14:25 +08:00
Ivy Zhang	763012a88a	[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill (#6051 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:04:08 +08:00
peaceh-nv	f5f31beee1	feat: Add deepseek-lite tests for RTX pro 6000 (#5903 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-16 15:51:45 +08:00
Wanli Jiang	8679a058a3	fix: Unable to load phi4-model with tp_size>1 (#5962 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-16 11:39:41 +08:00
brb-nv	9214ac662a	test: Add regression tests for Gemma3 VLM (#6033 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-15 11:37:56 -07:00
Fanrong Li	7a1af1c738	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 (#5989 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-07-16 01:33:12 +09:00
MinaHuai	9ebc3ab9c4	[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision (#5998 ) Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>	2025-07-15 10:01:35 -04:00
ruodil	2a147c4d01	test: add llama_v3.3_70b_cases in perf test (#6035 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-15 17:53:59 +10:00
ixlmar	f225f5cd2e	[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs (#5964 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-15 06:49:42 +08:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
Zhenhuan Chen	30608a5e6d	[https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error (#5865 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-14 17:17:30 +08:00
ruodil	347520494b	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	6d79559f3e	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	2991cf4b80	fix: [https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. (#5606 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Pengyun Lin	6992616c1f	[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	278a1a7df3	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Iman Tabrizian	c8874a7f94	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	e5e87ecf34	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yan Chunwei	9c673e9707	[TRTLLM-6160] chore: add sampling examples for pytorch (#5951 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 15:28:32 +09:00
Yan Chunwei	c30eead09f	[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch (#5956 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 14:09:39 +08:00
Thor Johnsen	041f1fa513	[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora (#5885 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>	2025-07-11 16:20:41 -07:00
xinhe-nv	509363d858	tests: update sanity tests & fix tests (#5906 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-11 19:48:19 +10:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
Yiqing Yan	3aa53ec36c	[None] - Waive L0 tests (#5915 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-10 18:33:17 +08:00
Enwei Zhu	055c4a9fe6	[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-10 16:30:00 +08:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
Bo Li	9d894bc0cb	fix: [https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. (#5842 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-09 10:17:05 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00
Yan Chunwei	e50d95c40d	chore [TRTLLM-6161]: add LLM speculative decoding example (#5706 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-09 07:33:11 +08:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
Chang Liu	08a3dfeb2b	[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814 )	2025-07-08 09:53:11 -07:00
Raayan Dhar	e3268a4221	[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg (#5732 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-07-08 09:39:58 -04:00
xinhe-nv	89bbb230cc	tests: waive failed cases on main (#5781 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-08 19:44:12 +10:00
liji-nv	95978e3044	[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-08 12:42:15 +08:00
Robin Kobus	30a19fcf7c	[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-07 16:30:43 +02:00
xinhe-nv	ded38ebdbd	test: [CI] remove closed bugs (#5770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-07 18:06:07 +10:00
Yanchao Lu	2013034948	[Test] - Waive or fix few known test failures (#5769 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-06 21:14:16 +08:00
Stefan Niebler	d1112aac37	[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-05 01:35:13 +09:00
Chuang Zhu	ffc0b8f5da	Cache transceiver support VSWA (#5505 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-07-05 01:18:42 +09:00
Yiqing Yan	7f3ea058f0	[Infra] - Waive L0 flaky test (#5759 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 19:25:12 +09:00
xinhe-nv	3869b969a6	test: [CI] Add failed cases into waives.txt (#5718 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 17:24:48 +09:00
Faraz	81c0764012	Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5724 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>	2025-07-04 16:53:20 +09:00
Yiqing Yan	b8fef809ae	[Infra] - Waive L0 test (#5748 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 15:04:49 +08:00
Yi Zhang	73d30a23c7	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Zheng Duan	cb9f596dbe	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
xinhe-nv	7f837b6e8b	tests: waive failures on main (#5704 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 12:39:12 +09:00
Venky	4762e0b244	Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example (#5740 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-07-04 11:01:08 +09:00
Netanel Haber	f91379b7e8	delete duplicate eagle3 and ngram tests (#5711 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-07-03 15:47:26 +03:00
Omer Ullman Argov	c72856188c	[ci] small multigpu speedups (#5643 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-03 08:06:10 -04:00
Emma Qiao	530897388c	[Infra] - Waive a failed case on main (#5702 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-03 06:09:27 -04:00
Emma Qiao	2a5fdebf10	[Infra] - Waive failed tests for main 0702 (#5671 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 22:05:07 -04:00
Emma Qiao	31699cbeb1	[Infra] - Set default timeout to 1hr and remove some specific settings (#5667 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 08:37:54 -04:00
Kaiyu Xie	f9a455651b	perf: Use tokenizers API to optimize incremental detokenization perf (#5574 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-01 09:35:25 -04:00
Yan Chunwei	3bc703d450	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
brb-nv	4ef60d5fbb	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Yan Chunwei	a5eff139f1	[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-01 19:06:41 +08:00
Emma Qiao	65c2b93284	[Infra] - Add some timeout and unwaive a test which dev fixed (#5631 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-01 05:01:32 -04:00
Pamela Peng	071ad758c4	[https://nvbugs/5318059 ][test] Unwaive test (#5624 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>	2025-07-01 04:54:44 -04:00
xinhe-nv	19c56f0374	test: [CI] Add failed cases into waives.txt (#5582 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 14:57:03 +08:00
xinhe-nv	a8cf611baa	test: [CI] Add failed cases into waives.txt (#5569 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 11:02:56 +08:00
xinhe-nv	9b17b29b6e	test: [CI] remove closed bugs (#5572 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 10:15:43 +08:00
Omer Ullman Argov	42134b8b84	[ci] move eagle1 and medusa tests to post-merge (#5604 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 19:32:28 +08:00
Fanrong Li	6cbc9a5297	[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-30 15:59:12 +08:00
Yiqing Yan	4fef14da56	Deduplicate waive list (#5546 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-30 11:12:26 +08:00
Talor Abramovich	70e34a3291	[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376 ) Signed-off-by: Talor Abramovich <talora@nvidia.com>	2025-06-29 12:46:30 +00:00
amirkl94	a985c0b7e6	tests: Move stress tests to be Post-Merge only (#5166 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-06-29 09:44:47 +03:00
Iman Tabrizian	26b953e29a	[nvbugs/5309940] Add support for input output token counts (#5445 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-28 04:39:39 +08:00
wili	56cdfe5c6c	[TRTLLM-5000][feat] NGrams V2 (#4569 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-06-27 23:00:17 +08:00
Iman Tabrizian	49af791f66	Add testing for trtllm-llmapi-launch with tritonserver (#5528 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-27 11:19:52 +08:00
xinhe-nv	a3494bebec	tests: waive failed tests on main (#5512 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-27 10:13:22 +08:00
Frank	aa6e015ef8	Update trtllm-bench to support new Pytorch default. (#5491 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-06-26 17:05:43 -07:00
jmydurant	8836990bde	[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 22:18:08 +08:00
Omer Ullman Argov	6bae76d7ca	[fix][ci] move torch tests to run under torch stage (#5473 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 14:31:38 +03:00
Omer Ullman Argov	1633bd2bef	[CI] move flashinfer llama tests to post merge (#5506 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-26 19:27:32 +08:00
xinhe-nv	ff2dd72df4	tests: waive tests (#5458 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-26 14:53:55 +08:00
Emma Qiao	32d1573c43	[Infra] - Add timeout setting for long tests found in post-merge (#5501 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-26 11:31:39 +08:00
Venky	d9b75f83fd	[CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5494 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-25 20:17:12 -07:00
jmydurant	578dbc8d9a	feat: chunked prefill for MLA (Blackwell) (#4651 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-06-26 09:01:00 +08:00
HuiGao-NV	74ae15a26b	CI: enable test cases on single device type (#5484 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-26 08:03:44 +08:00
QI JUN	feaf789342	CI: reduce BF16 test cases in B200 (#5482 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-06-26 07:18:20 +08:00
HuiGao-NV	cc3c2b3be2	Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 21:38:14 +08:00
Kaiyu Xie	d6ada5ffce	[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-25 20:37:24 +08:00
Netanel Haber	3ca2f6ac51	start OAIServer with `max_beam_width=1` for TorchSampler (#5427 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-06-25 15:52:06 +08:00
Enwei Zhu	fc7a81ceb0	test: Add LLGuidance test and refine guided decoding (#5348 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 14:12:56 +08:00
Enwei Zhu	76da7fed86	fix (NvBug 5354925): Fix static EPLB (#5411 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-25 13:14:40 +08:00
dongxuy04	699520082b	Add MTP support for Online EPLB (#5213 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-06-25 07:58:13 +08:00
Emma Qiao	475272046a	[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-24 17:19:31 +08:00
xinhe-nv	658fb5b54e	tests: update benchmark test lists (#5365 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 15:23:38 +08:00
xinhe-nv	4b32a3f1a7	test: [CI] remove closed bugs (#5400 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-24 13:39:57 +08:00
Fanrong Li	5d4ab47d5b	fix: refactor and fix mtp vanilla (#4762 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-20 05:23:39 +08:00
Kaiyu Xie	7246fd75d1	feat: Support stream_interval (#5284 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-19 21:57:10 +08:00
Enwei Zhu	bca758fce1	fix: Fix DS-R1 nvfp4 test case naming (#5361 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-19 15:50:43 +08:00
Emma Qiao	493f268b1c	[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:05:57 +08:00
ruodil	e22e884b02	test: amend test case name in perf cluster test (#5356 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:50:12 +08:00
ruodil	21ce9b6749	test: add qwen3 cases (#5302 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-19 14:38:36 +08:00
amitz-nv	1753202b61	[TRTLLM-5825][fix] Fix torch LoRA TP (#5338 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-06-19 09:12:00 +03:00
Emma Qiao	7f68de3e3f	Refactor test timeout for individual long case (#4757 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 13:52:11 +08:00
bhsueh_NV	dce8620013	chore: enable moe_backend on Qwen3 test (#5230 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-06-19 13:40:45 +08:00
xinhe-nv	e5400eeae0	tests: add ds r1 tp4 test (#5197 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-19 12:48:33 +08:00
Yiqing Yan	da576bcafa	Waive L0 test (#5349 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:01:11 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
Omer Ullman Argov	5010f8719d	[fix][test] remove duplicate test runs (#5241 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 01:59:54 +08:00
Omer Ullman Argov	a28a152001	[fix][test] remove some cpp test cases from h100 (#5335 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 20:40:26 +03:00
yuanjingx87	a1c5704055	[feat] Multi-node CI testing support via Slurm (#4771 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-19 01:11:12 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
HuiGao-NV	d13d2f460d	Remove duplicated test cases (#5323 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 21:20:20 +08:00
Emma Qiao	b29ac5b561	[Infra] Update 5080 and 5090 case condition due to the driver update (#5317 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-18 20:01:36 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Wanli Jiang	3a02489e86	[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-18 15:12:49 +08:00
ruodil	3b5d916250	test: cherry-pick deepseek rcca cases in main branch (#5307 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-18 14:26:26 +08:00
Yiqing Yan	8f67e3604d	Waive L0 tests (#5308 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-18 12:43:45 +08:00
Omer Ullman Argov	f501ce57b1	[fix][test] move deepseek single gpu tests to post merge (#5280 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 06:59:39 +03:00
Ivy Zhang	41cfcaa964	test: update qa test list (#5305 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-18 11:29:11 +08:00
Emma Qiao	ff32caf4d7	[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 23:48:34 +08:00
Yanchao Lu	f4cdbfcdf0	None - Some clean-ups for the automation pipeline (#5245 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 21:08:24 +08:00
QI JUN	ccd9adbe33	CI: move multi-gpu test cases of tensorrt backend to h200 (#5272 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:37:37 +08:00
Ivy Zhang	2ad8758ecc	[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases (#5073 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 17:14:01 +08:00
QI JUN	517c1ecf72	move some test cases of TensorRT backend back (#5232 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:03:11 +08:00
xinhe-nv	a49ad790b3	test: [CI] remove closed bugs (#5218 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-17 13:13:23 +08:00
QI JUN	546274d40e	fix ci (#5259 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 12:03:09 +08:00
ruodil	bb2348372c	test: add more pytorch cases in perf test (#5237 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-17 11:11:28 +08:00
Simeng Liu	5c18160d27	chore: Waive CI failure. (#5252 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-16 20:47:05 +02:00
Ivy Zhang	64b7f04fdc	[test] split nemotron test cases from examples_test_list (#5238 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-16 16:36:33 +08:00
xinhe-nv	802f22cd12	test: [CI] Add failed cases into waives.txt (#5221 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-16 16:11:53 +08:00
Yiqing Yan	8445416c39	Waive L0 tests (#5233 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-16 15:19:03 +08:00
ruodil	2848e012ae	test: add llama4 models for perf test (#5187 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-16 11:24:35 +08:00
ruodil	3d22f27063	test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-16 11:23:20 +08:00
Enwei Zhu	babdd9ce06	test: Add json_mode_eval for guided decoding evaluation (#5179 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-16 10:03:55 +08:00
amitz-nv	109c426077	Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130 )	2025-06-15 18:54:04 +03:00
Tailing Yuan	0b60da2c45	feat: large-scale EP(part 7: DeepEP integration) (#4792 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-14 19:12:38 +08:00
Enwei Zhu	5f2785fb90	fix: Fix waive list (#5205 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-13 23:33:23 +08:00
QI JUN	952f33dcad	CI: move all test cases of TensorRT backend into post merge (#5186 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-13 20:48:48 +08:00
xinhe-nv	30d9d0fa71	test: [CI] Add failed cases into waives.txt (#5178 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 16:38:51 +08:00
Ivy Zhang	28cd536bd6	[test] Update timeout params in QA test list (#5124 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-13 13:40:03 +08:00
Iman Tabrizian	01bd4c00b4	Add two MTP disaggregated test (#4546 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-13 12:17:45 +08:00
xinhe-nv	d9be419f45	tests: update tests for b200 (#5180 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 11:25:33 +08:00
ruodil	fa582cbe9a	test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-13 11:09:15 +08:00
nv-guomingz	cf35a079f9	fix:https://nvbugs/5298661 (#5022 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-12 20:41:44 +08:00
Shi Xiaowei	88cba5f354	test: waive the NIXL related tests (#5153 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-06-12 17:02:27 +08:00
Fanrong Li	4d070d3862	chore: fix typo in tests (#5092 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-12 15:11:26 +08:00
Michal Guzek	53983ad273	[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 15:06:28 +08:00
ruodil	d021cc5126	test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-12 14:59:16 +08:00
Venky	c3b2eb6dab	test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ (#5066 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-12 14:19:15 +08:00
xinhe-nv	11b94feff8	test: skip disaggregated tests on arm (#5070 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-11 17:00:10 +08:00
ruodil	56abae0835	test: add more llama_v3.3_70b cases in perf test (#4979 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-11 15:44:22 +08:00
Yiqing Yan	0a9f105931	Waive L0 tests (#5111 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-11 11:53:15 +08:00
Zheng Duan	580a92521e	test: conditional disagg and cache aware balancing for deepseek v3 (#4522 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-11 09:44:29 +08:00
liji-nv	f6a49a9343	[CI] waive failing L0 test (#5089 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-10 20:40:44 +08:00
Yiqing Yan	8ec8e4559d	Waive L0 test (#5077 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-10 16:23:49 +08:00
Yiqing Yan	fdfc711261	Waive L0 test (#5067 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-10 15:40:57 +08:00
Stanley Sun	74b0e71ef4	test: add more disaggregated serving tests into QA testlist (#5036 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-10 09:24:53 +08:00
pcastonguay	5b84fd9201	[nvbug 5283506] fix: Fix spec decode triton test (#4845 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-06-09 08:40:17 -04:00
Yukun He	137fe35539	fix: Fix warmup phase batch size out of range. (#4986 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-09 19:19:16 +08:00
Yuxian Qiu	88480197da	ci: [nvbugs/5280806] Unwaive unittests/_torch. (#4951 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-09 19:04:11 +08:00
liji-nv	1d4f748773	[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-09 17:50:57 +08:00
Yiqing Yan	6b17dff2f1	Waive L0 test (#5024 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-09 16:03:15 +08:00
Yan Chunwei	f4bfb8e49d	ci: unwaive llmapi launch test (#4991 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-09 13:25:43 +08:00
Omer Ullman Argov	8731f5f14f	chore: Mass integration of release/0.20 (#4898 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>	2025-06-08 23:26:26 +08:00
Mike Iovine	ec0d984656	[nvbug/5280806][fix] Fix 2 model spec decode flow (#4807 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-08 07:40:02 -04:00
Yanchao Lu	9e05613679	[Infra] - Update JNLP container config (#5008 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-08 16:44:09 +08:00
QI JUN	5ee0de7f2a	Resubmit #4894 (#4969 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-08 04:42:15 +08:00
Ivy Zhang	7dce328ad6	[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com>	2025-06-07 11:18:32 +08:00
Fanrong Li	75d020cf07	fix: fix cuda graph padding for spec decoding (#4853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-06 22:21:42 +08:00
Anthony Chang	eeb555e37b	chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-06-06 16:13:54 +08:00
xinhe-nv	564472168e	test: [CI] Add failed cases into waives.txt (#4966 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-06 10:30:15 +08:00
QI JUN	ec50684d80	Revert "fix a bug of global cuda graph dummy request" (#4970 )	2025-06-06 08:54:45 +08:00
QI JUN	154f7cc40a	fix a bug of global cuda graph dummy request (#4894 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-05 19:47:40 +08:00
Yiqing Yan	7e921c78b5	Waive L0 tests (#4953 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 19:36:48 +08:00
Shunkangz	3eae58ca36	Add disaggregated unittest (#4899 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-06-05 19:14:31 +08:00
QI JUN	d5a8079eb6	Revert "[infra] Unwaive unittests/_torch" (#4950 )	2025-06-05 17:21:07 +08:00
xinhe-nv	1c3091c63b	tests: [TRTQA-2906] add benchmark serving tests (#4901 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-05 14:33:03 +08:00
Yiqing Yan	9ceef983c0	Waive L0 tests (#4927 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-05 11:09:01 +08:00
xinhe-nv	50a74a1daa	tests: fix 5273697 (#4685 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-05 10:39:21 +08:00
Mike Iovine	8433091630	[infra] Unwaive unittests/_torch (#4919 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-05 08:49:37 +08:00
Lucas Liebenwein	f9d45e03a4	[AutoDeploy] deprecate CI post-merge tests and keep them for local testing (#4892 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-06-05 08:27:17 +08:00
Yi Zhang	1fca654bfd	tests: Update gb200 test case (#4754 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-04 18:49:20 +08:00
Shi Xiaowei	b13f8c9cba	Fix: NVBug 5302895 (#4835 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-06-04 09:31:39 +08:00
Simeng Liu	2384655c3a	chore: Waive examples/test_mistral.py::test_llm_mistral_v1_1gpu. (#4873 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-06-03 14:45:14 -04:00
Iman Tabrizian	141467d4b6	Add pre-merge Triton backend tests (#4842 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-03 00:47:58 -04:00
ruodil	fa93eeee84	shorten reqs in con:1 cases and add streaming cases, and add l2 perf … (#4849 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:28:13 +08:00
Ivy Zhang	8686868531	tests: [TRTQA-2905] improve timeout report for qa test cases (#4753 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 12:27:27 +08:00
Robin Kobus	e34a1beb72	[nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding (#4735 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-03 10:40:43 +08:00
Fanrong Li	380a5d1690	[https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue (#4536 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 10:03:34 +08:00
Fanrong Li	13f68338d2	fix: [https://nvbugspro.nvidia.com/bug/5273945 ] Unwaive tests for bug-5273945 (#4832 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 22:01:57 +08:00
Yanchao Lu	8166649d03	[Infra] - Minor clean-up and test Ubuntu mirrors (#4829 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-02 20:18:20 +08:00
Fanrong Li	7d356efc7d	fix: fix accuracy and illegal memory access issues when using mtp + attention dp (#4379 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-02 00:35:52 +08:00
amirkl94	8039ef45d3	CI: Performance regression tests update (#3531 )	2025-06-01 09:47:55 +03:00
Emma Qiao	202813f054	Check test names in waive list (#4292 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-01 14:39:30 +08:00
Dom Brown	338d6e9f95	[nvbug 5305210] fix: Resolve nvbug 5305210 (#4759 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-31 19:21:06 +08:00
Emma Qiao	c945e92fdb	[Infra]Remove some old keyword (#4552 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-31 13:50:45 +08:00
Jhao-Ting Chen	fcadce9f8d	[fix] Eagle-2 LLMAPI pybind argument fix. (#3967 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-29 12:23:25 -07:00
yuanjingx87	2c48ff5898	[feat] add b200 support via slurm (#4709 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-29 14:49:46 +08:00
Yan Chunwei	33a9ba55f5	fix: test trtllm-bench mgmn (#4613 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-29 14:43:47 +08:00
ruodil	500aca4f44	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 13:58:47 +08:00
QI JUN	058f83e47b	CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-29 11:15:56 +08:00
xinhe-nv	93283484c2	test: [CI] Add failed cases into waives.txt (#4688 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-28 22:04:35 +08:00
amirkl94	fbec0c3552	Release 0.20 to main (#4577 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: stnie <82932102+stnie@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-05-28 16:25:33 +08:00
xinhe-nv	bb3d998eb1	test: [CI] remove closed bugs (#4638 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-27 18:07:59 +08:00
Yiqing Yan	92a7984945	Waive L0 tests (#4686 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-27 15:07:02 +08:00
xinhe-nv	59f7622281	test: rcca https://nvbugs/5223130 (#4510 ) * add rcca tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip tests on blackwell Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-27 09:59:47 +08:00
yuanjingx87	732d92ff62	[Infra] - Multi-GPU testing support with Slurm (#4454 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-26 19:44:19 +08:00
Enwei Zhu	88190faa34	feat: large-scale EP(part 4: Static EP load balancer integration) (#4615 ) * MoeLoadBalancerConfig Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * MoeLoadBalancer integration Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * config file Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-26 18:25:11 +08:00
Yiqing Yan	2fee408536	Waive L0 tests (#4645 ) * Waive L0 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-26 11:05:01 +08:00
Yanchao Lu	20c15fc04f	Fix invalid testcase name (#4626 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-24 00:40:00 +08:00
Anthony Chang	bbea2647b1	Qwen3 supports TRTLLM FP4 MoE backend (#4530 ) * MoE TRTLLM backend for Qwen3 Signed-off-by: Anthony Chang <anchengc@nvidia.com> * add extra moe_backend to test Signed-off-by: Anthony Chang <anchengc@nvidia.com> * address comments Signed-off-by: Anthony Chang <anchengc@nvidia.com> * conditionally compile kernels on newer archs Signed-off-by: Anthony Chang <anchengc@nvidia.com> * missing positional arg Signed-off-by: Anthony Chang <anchengc@nvidia.com> * Update the routing kernels Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Revise usage of TLLM_LOG_ERROR Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Add unit test for Qwen3 moe (trtllm_gen backend) Signed-off-by: Christina Zhang <christinaz@nvidia.com> * improve weight processing speed of moe_backend=TRTLLM; roughly 2x Signed-off-by: Anthony Chang <anchengc@nvidia.com> * tidy and minor fix Signed-off-by: Anthony Chang <anchengc@nvidia.com> * temporarily disable accuracy test that has known issue Signed-off-by: Anthony Chang <anchengc@nvidia.com> --------- Signed-off-by: Anthony Chang <anchengc@nvidia.com> Signed-off-by: Christina Zhang <christinaz@nvidia.com> Co-authored-by: Christina Zhang <christinaz@nvidia.com>	2025-05-23 18:31:08 +08:00
Enwei Zhu	d7443b6068	[https://nvbugspro.nvidia.com/bug/5181262 ] [test] Unwaive Mistral Nemo test (#4515 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-23 10:14:00 +08:00
pcastonguay	d7d455e7ea	[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243 ) * feat: Enabling dis serving with TRT backend with Python runtime Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing formatting Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg mtp test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-22 22:01:06 -04:00
Mike Iovine	14fc48ada7	[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402 ) [fix] Fix chunked prefill + overlap scheduler Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-23 04:38:22 +08:00
Venky	c713eb5799	test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) (#4446 ) ultra Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-22 13:07:33 -07:00
xinhe-nv	22c01d5b21	test: [CI] Add failed cases into waives.txt (#4549 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix test issues Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-22 17:18:53 +08:00
ruodil	1a45890dae	test: waive hanging cases for perf test (#4562 ) waive hanging cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-22 15:50:05 +08:00
HuiGao-NV	bc9f1dbede	fix[nvbug-5228840]: Remove test cases of feature not supported anymore (#3972 ) * Remove waived cases * Remove test cases of not supported feature Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-22 11:18:58 +08:00
Michal Guzek	9033dd987d	[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct (#4415 ) Add phi-4-mini CLI acc test Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-22 09:56:48 +08:00
Chuang Zhu	44cfd757b2	Agent interface impl for NIXL (#4125 ) * agentConnection Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> recv Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> agentState Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> NIXL interfaces Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> update cmakelists Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> nixl improve Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> remove cppzmq Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> transferAgent remove register Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> work for cache Test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> reduce sleep time Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> intergarte Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> nixl env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix rebase error Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> cpp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> stash for send metaData Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> loadRemoteMD after fetchRemoteMD Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> workaround for mixed gen and context Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> test_env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> avoid port conflict in test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * format Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use std::string Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * typo Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * fix transferAgentTest Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-22 09:09:41 +08:00
Dom Brown	1cffa99792	test: Split test_simple into mpi_utils and cache transceiver tests for DGX (#4451 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-22 04:26:21 +08:00
Venky	0a8461d54c	test(perf): Pt.2 Add `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (cpp) (#4499 ) add low concurrency perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-21 10:46:48 -07:00
xinhe-nv	407ef08662	tests: add qwene fp4 tests into QA test list & update sanity test list (#4478 ) * update sanity test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update test list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-21 16:52:02 +08:00
ruodil	83f1933f0c	test: add failed case in waive list and fix some test script issue for perf test (#4527 ) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-21 16:37:25 +08:00
QI JUN	15317ece5a	CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite (#4520 ) waive test_fp8_block_scales_4gpus of deepseek v3 lite Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-21 13:19:43 +08:00
xinhe-nv	750f412b8f	tests: add llama 3.3 70b 2 nodes tests (#4391 ) * add llama 3.3 70b 2 nodes tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * remove enable_overlap_scheduler parameter Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-21 12:42:45 +08:00
Chuang Zhu	ab5bea957d	unwaive some disagg test (#4476 ) * unwaive some disagg test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * pytest.mark.skip_less_device(4) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-21 11:45:11 +08:00
Yan Chunwei	9199793848	fix: llmapi-launch add add trtllm-bench test with engine building (#4091 ) * add trtllm-bench mgmn test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-21 10:18:01 +08:00
Zheng Duan	77a0189554	feat: conditional disaggregation in disagg server (#3974 )	2025-05-21 09:57:46 +08:00
Venky	9a8c3ece22	test(perf): Add remaining `Phi-4-mini-instruct` perf tests (#4443 ) add remaining 2 phi cpp perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-21 09:26:12 +08:00
xinhe-nv	19c6e68bec	test: [CI] remove closed bugs (#4417 ) * waives closed bugs Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-21 09:13:25 +08:00
bhsueh_NV	ec4190fb71	infra: Add qwen3 235B tests into QA (#4483 ) * add qwen3 qa test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * add qwen3 test into qa list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-20 17:37:09 +08:00
ruodil	b5edf13b33	test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282 ) * add cases for rtx_pro_6000 and update test filter Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-20 10:58:05 +08:00
Michal Guzek	0a342a42f7	[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362 ) * Add CLI TestLlama3_3_70BInstruct acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests to qa lists Signed-off-by: moraxu <mguzek@nvidia.com> * Add comment Signed-off-by: moraxu <mguzek@nvidia.com> * Fix test names Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Update cli file Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-20 09:48:14 +08:00
xinhe-nv	402385588d	test: [CI] Add failed cases into waives.txt (#4429 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waive id Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-20 09:43:55 +08:00
Yuxian Qiu	c8e062bfd3	fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-05-19 14:25:36 -07:00
Venky	bb02d86b54	test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) (#4128 ) * changes to run llama-v3.3-nemotron-super-49b Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * yapf Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * address review comments pt 1 Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * re-add cpp super tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-19 12:00:48 -07:00
Faraz	7656af1b57	[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335 ) * add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * update cutlass versions Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added internal cutlass with fix and docker update Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added mixtral to pro 6000 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 08:56:21 -07:00
liji-nv	58e405624a	[https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 (#3952 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-05-19 22:12:25 +08:00
Iman Tabrizian	c6074c47da	Add llama4 disagg accuracy tests (#4336 ) * Add llama4 disagg accuracy tests Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Make it async and add GSM8K benchmark Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-19 21:55:08 +08:00
Dom Brown	c45f414bbf	Test: Improve model re-use in C++ DGX tests for CI stability (#4263 ) * Fix padded vocab size for Llama Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Refactor multi GPU llama executor tests, and reuse the built model engines Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Fix test list typo Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Further WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update test lists and readme Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Try parametrize for asymmetric Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Parametrize + skip unsupported combinations Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> * Update test list Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> * Reduce environment duplicated code Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>	2025-05-19 14:20:21 +01:00
Yan Chunwei	5b1c88de8d	chore: cleanup perf_evaluator code (#3833 ) * chore: cleanup perf_evaluator code Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * up Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-19 13:21:36 +08:00
Ivy Zhang	58d2508b89	tests: Add test cases for rcca cases (#4347 ) * add qwen2_0_5_instruct cp4 test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen2.5 fp8 kvcache test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add ds distill qwen cpp runner test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 12:06:43 +08:00
Ivy Zhang	c4a0d768b5	tests: add qa test mentioned in docs (#4357 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix import error Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * nemotronh fp8 trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove nemotronh-fp8 Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 10:06:51 +08:00
Faraz	791c209006	[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) (#4363 ) * added nemotron 49b fp8 for B40 release Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * add tests to QA list Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * pre-commit changes Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 09:30:24 +08:00
Iman Tabrizian	7de90a66bc	Remove vila test (#4376 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-19 09:02:39 +08:00
Yanchao Lu	0d7269e2a7	[Infra][Docs] - Some clean-up for the CI pipeline and docs (#4419 ) * [Docs] - Some clean-up for the docs Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * [Infra] - Some clean-up for the CI pipeline Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-19 00:07:45 +08:00
shaharmor98	27afcb9928	add changes for fp8, nemotron-nas, API (#4180 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-05-18 23:27:25 +08:00
Venky	fb663b637a	Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) (#4195 ) * add ll-nm-nano tests that map to nim requirements Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * prune some pytorch cases (fp8) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * removing pyt backend test changes - When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging. - Therefore don't want to block this PR, hence removing them. - Seeing Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-17 22:46:21 +08:00
Yuxian Qiu	cc1bba1686	test: Waive tests for nvbugs/5286795. (#4409 ) * Waive tests for nvbugs/5286795. Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-17 19:41:05 +08:00
Jinyang Yuan	b618e1f55b	perf: Eliminate the need for attention DP padding when possible (#3439 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: raccoonliukai <raccoonliu@tencent.com>	2025-05-17 13:30:55 +08:00
liji-nv	fb437ed709	[CI] waive accuracy/test_cli_flow.py::TestTinyLlama1_1BChat::test_pp4 (#4397 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-05-16 20:18:07 +08:00
Daniel Cámpora	df19430629	chore: Mass Integration 0.19 (#4255 ) * fix: Fix/fused moe 0.19 (#3799) * fix bug of stream init Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix: Add pre-download of checkpoint before benchmark. (#3772) * Add pre-download of checkpoint before benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Add missing remote code flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move from_pretrained to throughput benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move download and use snapshot_download. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Removed trusted flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Fix benchmark command in iteration log test. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> --------- Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fuse Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * TRTLLM-4875 feat: Add version switcher to doc (#3871) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * waive a test (#3897) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> * fix: remote mpi session abort (#3884) * fix remote mpi session Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * skip fp8 gemm for pre-hopper (#3931) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update multigpu list Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix namings Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * Doc: Fix H200 DeepSeek R1 perf doc (#4006) * fix doc Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * update perf number Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * Fix the perf regression caused by insufficient cache warmup. (#4042) Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * doc: Update 0.19.0 release notes (#3976) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * Optimize the AutoTuner cache access code to reduce host code overhead. (#4060) The NVFP4 Linear op is very sensitive to the host overhead. This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Update switcher (#4098) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * doc: update release notes (#4108) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * docs:update 0.19 doc. (#4120) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * docs:add torch flow supported model list. (#4129) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * doc: Release V0.19 Perf Overview Update (#4166) Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> * Fix readme of autodeploy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update tensorrt_llm/_torch/pyexecutor/llm_request.py Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert mgmn worker node. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Change to disable_overlap_scheduler. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>	2025-05-16 10:53:25 +02:00
xinhe-nv	500b43e90c	test: [CI] remove closed bugs (#4345 ) update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-16 13:47:42 +08:00
Stanley Sun	11aa50d1ea	test: add kv cache aware test cases to qa test list (#4257 ) add kv cache_aware test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-16 12:47:01 +08:00
Iman Tabrizian	4c7191af67	Move Triton backend to TRT-LLM main (#3549 ) * Move TRT-LLM backend repo to TRT-LLM repo Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Address review comments Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * debug ci Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Update triton backend Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Fixes after update Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-16 07:15:23 +08:00
yuxianq	4f8afe4cc6	feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-16 04:16:53 +08:00
Venky	adb0839a33	test(perf): Add `Phi-4-mini-instruct` to perf tests (#4267 ) * add phi-4-mini-instruct Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * trim tests Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-15 21:27:03 +08:00
Yanchao Lu	5ce1102a02	Revert "[test] add qa test mentioned in docs" (#4355 ) Revert "[test] add qa test mentioned in docs (#4248)" This reverts commit `b0ce1371ee`.	2025-05-15 18:47:30 +08:00
Stanley Sun	9d3e05486b	test: add qa test list for rtx5090 and rtx_pro_6000 (#4254 ) * add test list for rtx5090 and rtx_pro_6000 Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpu llama70b test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * remove duplicate and invalid test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpus test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-15 17:57:31 +08:00
xinhe-nv	14bfb5e0d6	test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283 ) * update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-15 15:57:44 +08:00
zhhuang-nv	97bc680cd8	feat: support kv cache reuse for MLA (#3571 ) * support kv cache reuse for MLA load compressed_kv and k_pe and do up-projection use 192/128 head size MLA context kernel support Blackwell and Hopper now Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * add CI test Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2 Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * use GPTJ style RoPE for MLA Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix rebase error and some docs Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix kv_lens Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * tiny fix Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: use normal device memory instead of pinned memory for unit test Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * fix L0 tests Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile after rebase Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments again Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> --------- Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com> Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-05-15 15:22:21 +08:00
dominicshanshan	404fbe9b32	[https://nvbugs/5277113 ][fix]genai-perf API change stress test (#4300 ) * fix bug 5277113. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * fix bug 5277113 and 5278517. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-15 14:12:34 +08:00
Ivy Zhang	b0ce1371ee	[test] add qa test mentioned in docs (#4248 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-15 13:37:11 +08:00
hlu1	3ea42e7519	[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346 ) Reorganize TestDeepSeekR1::test_nvfp4_8gpus Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-15 13:09:13 +08:00
Mike Iovine	f9adac3dea	[feat] Enable chunked context for flashinfer (#4132 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-15 10:59:38 +08:00
Robin Kobus	d31fefde2c	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 ) * chore: Remove GptSession/V1 from TRT workflow Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove stateful decoders Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession buffers Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession utils Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession kernels Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove V1 GPT models from tests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSessionBenchmark from scripts and docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSession IO classes Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from test lists Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove useless encoder test Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove mActualBatchSize from DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove static batching from ExecutorTest - Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter. - Adjusted related test functions to reflect the changes in parameter lists. - Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-14 23:10:04 +02:00
Faraz	42de79d49e	test: Added tests for Llama3.1-70B-BF16 on SM120 (#4198 ) * Added tests for Llama3.1-70B-BF16 on SM120 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * solve conflicts add more tests Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-14 11:57:49 -04:00
Yanchao Lu	504f4bf779	[Infra] - Update the upstream PyTorch dependency to 2.7.0 (#4235 ) [Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-14 22:28:13 +08:00
Kaiyu Xie	6c45586c51	chore: Remove deprecated Python runtime benchmark (#4171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-14 18:41:05 +08:00
xinhe-nv	f2bfe2f84f	test: [CI] remove closed bugs (#4207 ) update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-14 17:59:05 +08:00
DylanChen-NV	206f82115d	[bug/5247505] fix: CP accuracy on Blackwell (#4188 ) * fix xqa params for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * try adding B200 multi gpu test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add accuracy tests for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> --------- Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-05-14 17:40:50 +08:00
Yiqing Yan	a66a02a75a	[Infra] Waive L0 test (#4295 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-14 16:38:33 +08:00
Zongfei Jing	bb17649517	test: Add UT for moe trtllmgen (#4258 ) * Add ut for moe trtllmgen Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Update tests/unittest/_torch/modeling/test_modeling_deepseek.py Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>	2025-05-14 15:22:58 +08:00
bhsueh_NV	1a9298bc66	CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266 ) add fp8/fp4 ci on Qwen3-30B-A3B Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-14 14:38:04 +08:00
brb-nv	8280c3d4f2	feat: Support Gemma3-1b-it in Pytorch workflow (#3999 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 14:02:44 +08:00
brb-nv	1ef117688c	test: Validate FP8 and LoRA for Gemma3 (#3670 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-13 17:28:02 -07:00
Iman Tabrizian	f408de2d99	Waive disagg kv cache load balancer test (#4276 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-14 06:03:24 +08:00
brb-nv	cd5b3d21a0	feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 03:47:22 +08:00
Yiqing Yan	290649b6aa	[Infra] Waive L0 test (#4269 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 23:06:13 +08:00
Yiqing Yan	bfa16a63d4	[Infra] Waive L0 test (#4268 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 22:43:17 +08:00
dominicshanshan	44d6adfb68	Waive stress test. (#4262 ) * Waive stress test. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 21:01:57 +08:00
Enwei Zhu	8f68d56cc1	[https://nvbugs/5220763 ] [test] Unwaive Mixtral FP8 TP2 test (#4252 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 15:55:33 +08:00
Yiqing Yan	fda8b0277a	[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 (#4049 ) * [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix review Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * update images Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * update image name Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-13 14:59:12 +08:00
ruodil	d555fe2530	test: fix for perf test script issue (#4230 ) fix for perf test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-13 10:29:20 +08:00
xinhe-nv	0cebc16139	test: [CI] Add failed cases into waives.txt (#4205 ) waive tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-13 10:22:42 +08:00
xinhe-nv	7ebae4dcaa	test: [CI] Add failed cases into waives.txt (#4203 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-13 10:08:02 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
Enwei Zhu	c31ca1688c	[https://nvbugs/5214229 ] [fix] Unwaive lm_head quantization case (#4222 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 20:23:06 +08:00
Zheng Duan	c9e2a963e0	feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-12 07:23:57 -04:00
Yixin Dong	c90ebadd84	feat: Support the Structural Tag in guided decoding (#4066 ) * finish Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * exc overlap scheduler Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix api ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Ubospica <ubospica@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 17:24:50 +08:00
Yechan Kim	3e9bda3a09	[feat] Support HyperCLOVAX-SEED-Text language part (#3902 ) * feat: support HyperCLOVAX-SEED-Text language part Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add Pytorch flow and remove test file Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * revert summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove from pytorch example Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-05-12 16:05:14 +08:00
ruodil	9c03a7ab74	test: add llama_3.2_1B model and fix for test lora script issue (#4139 ) * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add llama_3.2_1B model and fix for lora script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-12 14:51:59 +08:00
xinhe-nv	849d9c343c	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 ) remove unsupported tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 14:13:40 +08:00
Yiqing Yan	3c54e84e47	[Infra] Waive L0 test (#4212 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-12 11:37:49 +08:00
QI JUN	f021afa241	[CI] waive two multi-gpu test cases (#4206 ) waive two multi-gpu test cases Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-12 08:04:48 +08:00
Dom Brown	2d0f93a054	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 ) * Refactor: Restructure C++ tests for better modularisation of non-shared code Start cleanup of pytest code for C++ tests Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Clean up names and remove references to test_cpp.py Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Move multi-GPU code Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Update doc and try un-waiving Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update multi GPU file check Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Address minor multi-GPU setup bug Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-09 19:16:51 +01:00
Mike Iovine	4b8ba7ad61	[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069 ) [fix] Fix llama 4 test lists Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-09 22:45:14 +08:00
ruodil	bf5b2a2e0a	test: amend regex match for perf throughput (#4186 ) amend regex match for perf throughput Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 17:33:25 +08:00
xinhe-nv	9082411a50	test: [CI] Add failed cases into waives.txt (#4165 ) wavie oom tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-09 16:56:30 +08:00
ruodil	5ce5b81281	test: amend default pytorch extra-llm-api-config.yml in perf test (#4176 ) * amend default pytorch extra-llm-api-config.yml Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add print info to separate cases in output log Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 16:46:48 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
Stanley Sun	fb31f91e15	test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-09 11:03:02 +08:00
Ivy Zhang	7666bec7c4	[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053 ) * add MMLU, GPQADiamond check for llama-4 models Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add nomotron cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add online quant test cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove trt flow cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust parallelism strategy Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix fail Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update sanity list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix comment Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * skip nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:10:41 +08:00
xinhe-nv	4468158be4	test: [CI] remove closed bugs (#4046 ) update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:04:43 +08:00
Yiqing Yan	ce8832e80f	[Infra] Waive L0 flaky test (#4148 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-08 17:23:45 +08:00
yuanjingx87	6e1d2a1320	feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI (#4019 ) * Add slurm support with RTXPro6000 PostMerge Tests Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> * remove H100 post merge test from testing Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> --------- Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-08 15:15:36 +08:00
Enwei Zhu	dae6781494	test: Waive disagg accuracy test (#4124 ) * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-08 13:39:07 +08:00
ruodil	4d0e462723	tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864 ) * tests: skip writing prepare_dataset output to logs Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-07 13:56:35 +08:00
Enwei Zhu	c28b90984f	[TRTLLM-3925, https://nvbugs/5245262 ] [fix] Normalize LLM.generate API (#3985 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-07 11:06:23 +08:00
Venky	62fea1e885	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>	2025-05-06 17:17:55 -07:00
dominicshanshan	3ac6637005	fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836 ) * Remove stdout pipe for genai-perf and make stress time as public parameter. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Update llmRequest based on comment. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * launch process function refactor. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-06 16:52:30 +08:00
pansicheng	e84dc6b3c7	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 ) * add deepseek-r1 reasoning parser Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> * fix test Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> --------- Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-05-06 08:13:04 +08:00
Iman Tabrizian	85867d76dd	test: Add disaggregated serving accuracy tests (#4036 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-05 08:56:59 -07:00
Yanchao Lu	5ee38ad92a	[Test]: Clean up stale waives (#4062 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-05 22:13:12 +08:00
Yanchao Lu	ddfb0fe4e2	[Test]: Waive unsupported tests (#4059 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-05 20:51:49 +08:00
Yiqing Yan	b5c2327aa0	Waive L0 tests (#4051 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-05 12:53:21 +08:00
Yukun He	aa38e28cfa	fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988 ) * Fix AllReduce kernel hang issue when both tp and pp are enabled. Allocate one workspace for each pp rank to avoid potential race. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * update waive list Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> --------- Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-05 11:33:25 +08:00
Yan Chunwei	bc0cf41592	chore: refactor llmapi e2e tests (#3803 ) * refactor llmapi e2e tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-05 07:37:24 +08:00
Emma Qiao	2692daad2e	infra: Remove the WAR for test items incompletely (#3313 ) * Remove the WAR for test items incompleted Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test item manually Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test definition file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix some other test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update name for waived case name, too Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix name for multi-gpu tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix other qa tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix tests name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Fix name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Correct test names in waive.txt Signed-off-by: qqiao <qqiao@nvidia.com> * Add new test_durations file Signed-off-by: qqiao <qqiao@nvidia.com> * Fix names after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Update test duration to latest Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-04 11:31:59 +08:00
Mike Iovine	906cddffb0	[infra] Improve llama4 parallelism test coverage (#3821 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-02 16:15:04 -04:00
bhsueh_NV	561ee44737	add ci and doc for qwen3 (#4022 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-02 14:13:38 +08:00
xinhe-nv	009d5e9fa3	test: [CI] Add failed cases into waives.txt (#3943 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * waive test_llm_commandr_v01_single_gpu_summary for GH200 Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-01 23:43:11 +08:00
nv-guomingz	dc344b6a4f	fix:https://nvbugs/5246733 (#3989 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-05-01 22:52:31 +08:00
YueWeng	b1621e8d4e	feat: add relaxed acceptance for DS (#3865 ) * add relaxed acceptance for DS R1 Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * clean and update docs Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * Modified based on review Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix mtp manager issue Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> --------- Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-01 21:50:36 +08:00
Chuang Zhu	1ada3c9800	unwaive disagg tests (#3925 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-30 16:44:00 +08:00
xinhe-nv	a31afcf3a9	update waive list (#3890 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-30 11:07:48 +08:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
QI JUN	c381380ecc	increase H100 CI nodes for PyTorch only pipelines (#3927 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-29 10:58:43 +08:00
Jinyang Yuan	dafc28fb85	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
xiweny	f84dd8f815	test: add deepseek v3 & r1 cases (#3528 ) * test: add deepseek v3 & r1 cases Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-28 23:37:26 +08:00
xinhe-nv	82a8e43557	test: [CI] Add failed cases into waives.txt (#3867 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-28 14:32:48 +08:00
xinhe-nv	e20b67e9fd	update waives & tests (#3887 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-28 14:29:35 +08:00
Yanchao Lu	068c72ebf8	Test: waive intermittent test hang (#3894 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-28 08:53:20 +08:00
Iman Tabrizian	74cc9e26ff	infra: install Triton in the base image (#3759 ) * infra: install Triton in the base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * install Triton from the base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * update base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Address review comments Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * update base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * waive test Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-28 07:36:30 +08:00
Dom Brown	7ff9fd345c	Test: Split C++ unit tests for CI granularity (#3868 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-04-25 13:30:58 -07:00
Yiqing Yan	238fefc659	[infra] Waive L0 tests (#3853 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-25 17:32:21 +08:00
QI JUN	991939a0f4	chore: increase A30 for cpp test (#3811 ) * increase A30 for cpp test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * enable parallel run test for gpt_executor Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * decrease freeGpuMemoryFraction of cpp tests Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-24 16:34:39 -07:00
xinhe-nv	476d7003f8	test: [CI] Add failed cases into waives.txt (#3777 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives.txt Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-24 09:36:05 +08:00
Zhanrui Sun	bfc4e55ded	infra: [TRTLLM-4417]Support auto trigger special test stage for special file change (#3478 ) * infra: Support auto trigger special test stage for special file change Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-23 20:32:19 +08:00
Enwei Zhu	8f2b2eaf83	test: Add DeepSeek-V3-Lite GSM8K tests (#3771 ) * tmp Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update waives Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-23 16:54:48 +08:00
xinhe-nv	b82d72bc37	update waive list (#3696 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-23 14:18:57 +08:00
Yechan Kim	11d35656bf	fix: nvbugs/5234029 fix Qwen2.5-VL image test (#3726 ) * fix: nvbugs/5234029 fix Qwen2.5-VL image test case by adding more answer candidate Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove qwen2.5_vl from waive list Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-04-23 14:09:39 +08:00
xinhe-nv	80d8fdefd6	add test_mistral_large_hidden_vocab_size tests (#3716 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-23 13:40:11 +08:00
Yiqing Yan	cc161dd83d	Waive L0 tests (#3784 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-23 11:22:11 +08:00
QI JUN	257abfbc51	move pytorch tests of LLM API into separate test files (#3745 ) * move pytorch tests of LLM API into separate test files Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * polish Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-22 14:36:59 -07:00
Emma Qiao	442386d302	infra: Add test stages for sm120 (#3533 ) * Add test stages for sm120 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update chip name and config name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split tests to gb202 and gb203 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Don't flash driver for rtx-5090 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip the failed cases Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change the test stage names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> * Skip failed case on gb202 Signed-off-by: qqiao <qqiao@nvidia.com> * Fix condition to dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-04-23 01:26:12 +08:00
Ivy Zhang	47d2f16bb8	waive gemma on L20 (#3767 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-22 17:52:49 +08:00
ruodil	9223000765	waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-22 14:51:45 +08:00
xinhe-nv	ba216341f4	update waive list (#3683 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-22 11:09:41 +08:00
Enwei Zhu	3fa19ffa4e	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 ) * add gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add gpqa Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * conditional import lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * gpqa in lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * system prompt Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * shuffle Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * revert AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * integration to tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add DS-R1 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix and clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * clean up Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * free_gpu_memory_fraction=0.8 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-22 07:38:16 +08:00
Barry Kang	d87b009d8d	Fix ModelOpt Mixtral AWQ OOM (#3714 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-04-21 19:14:14 +08:00
Iman Tabrizian	af04b6f6aa	bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095 ) * Fix hang bug when KV cache is low Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Review comments Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Fix attentiondp typo Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add CI test for this case Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * fix: Fix the insertion order for responder futures Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * fix: Fix disagg CPP Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-21 15:16:55 +08:00
Stanley Sun	852dd0c1be	test: add llama3.2 ptp test case (#3363 ) * add llama3.2 ptp test case Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * update test list Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-04-21 15:15:45 +08:00
Yiqing Yan	6f7f262779	Waive L0 tests (#3709 ) * Waive L0 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * the test is fixed in PR 3711 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-21 11:24:00 +08:00
Emma Qiao	48db263d9a	infra: Add test list name check (#3097 ) * Add steps to check test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct test-db command Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Switch to use a trt-llm image Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update go path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct go path Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move the test list check to test ci Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct file path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix path again Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix get path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip test list check for ARM Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix expression Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change back unrelated file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct qa test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove a stage Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move some steps to a python script Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix script path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split commands and debug Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Also correct case name in waives list Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move check script to another folder Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update qa list after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove the perf tests under QA Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Some tests already fixed after rebase to TOT Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-20 23:02:16 +08:00
brb-nv	c35d2a7532	test: Get Eagle tests working (#3593 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-20 00:50:57 +08:00
nv-guomingz	e70961f541	test:update waives.txt for nvbug 5219532 (#3672 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-19 18:57:39 +08:00
Iman Tabrizian	61ee983488	fix: Fix disaggregated load balance test (#3689 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-19 10:40:40 +08:00
Iman Tabrizian	a2f190f306	chore: Waive disaggregated load balance (#3687 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-18 16:04:33 -07:00
Yechan Kim	5460d18b10	feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-19 05:01:28 +08:00
pcastonguay	ae5671644a	feat: Disaggregated router class (#3584 ) * Add draft scheduler class Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor the design Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * feat: Introduce router class for disaggregated server Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Add unit tests for router class Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding tests for disagg_utils Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing missing import Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg integration tests Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Addressing MR review comments Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-19 00:34:12 +08:00
QI JUN	b9fce42717	enable test_ptp_quickstart_advanced_mixed_precision (#3667 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 05:06:24 -07:00
Zheng Duan	bce7ea8c38	test: add kv cache event tests for disagg workers (#3602 )	2025-04-18 18:30:19 +08:00
peaceh-nv	88cff61fa1	chore : Split more tests out of gpt tests (#3524 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-04-18 12:04:57 +08:00
dongfengy	b71a0f76b4	test: Add llama 4 to ci (#3520 ) * Add llama 4 to ci Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Only test trtllm Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Disable marverick Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> --------- Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-04-18 11:25:52 +08:00
Ivy Zhang	ad19ca3cbf	remove benchmark test list (#3644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 16:23:41 +08:00
Netanel Haber	3c52ac098f	feat: allocate minimal blocks per window size (#3028 ) * implement variable window attention by breaking the block manager into window block managers per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * revert isCyclic to be true if the min attention window is reached, not per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add explanatory comment to mCyclicThreshold Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * load correct gemma config Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * if TYPE_CHECKING Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * pass dtype as well Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * test_gemma variable sliding window attention Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove \|\| mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * turn off request delaying for MaxUtil Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * make comments better Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * windowSizesTotalSum using std::accumulate Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comments Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove assert that kills disagg tests, since it isn't necessary Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add Gemma3 to SUPPORTED_HF_ARCHITECTURES Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * support Gemma3 Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix kvfactor field for deepseek Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix gemma-3 entries in testlist to include vswa Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * only quantize gemma2 VSWA Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> remove misleading comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix: disable KV cache reuse if using attention sink (#3021) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-17 16:04:57 +08:00
Yiqing Yan	1c6f3debbb	Waive L0 tests (#3651 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-17 15:13:56 +08:00
xinhe-nv	b82a4e8d01	test: [CI] Add failed cases into waives.txt (#3627 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-17 14:45:41 +08:00
Ivy Zhang	b2fb0fe843	test: add quickstart test for nemotron-ultra (#3596 ) * add quickstart test for nemotron-ultra Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix test name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 11:16:41 +08:00
ruodil	5e2ebebe76	tests: change qa perf test to trtllm-bench (#3189 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 09:53:32 +08:00
QI JUN	ab29348db2	waive test_llm_phi_quantization_1gpu (#3603 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-16 13:33:46 +08:00
Daniel Cámpora	41ce5440fe	chore: Mass integration of release/0.18 (#3421 ) * [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <ruodil@nvidia.com> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <qqiao@nvidia.com> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Formatting. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove duplicated file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove indent change. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update dev images Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * Remove duplicated import. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix custom op Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Skip problematic case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-16 10:03:29 +08:00
xiweny	da47d5f27e	fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 (#3585 ) * fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> * remove waiver Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> --------- Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-16 08:31:33 +08:00
HuiGao-NV	d35db254e2	test: Enable 4 multi-gpu test cases for deepseek (#3569 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com>	2025-04-15 22:01:52 +08:00
Yan Chunwei	c27e130be0	unwaive test (#3559 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-15 19:42:06 +08:00
xinhe-nv	5cfa927132	update waive list (#3503 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 16:53:53 +08:00
xinhe-nv	0e152910f5	update waive list (#3498 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 14:33:49 +08:00
Zheng Duan	b0cb963199	test: torch-flow conditional disagg test (#3410 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-15 10:54:14 +08:00
nv-guomingz	b32ae7ac92	test:add fp8_kv_cache functionality test case. (#3457 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-15 09:16:46 +08:00
Iman Tabrizian	bad55e99bb	test: Add MTP + overlap + Attention DP disaggregated test (#3542 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-15 07:46:03 +08:00
Ivy Zhang	170bc22139	fix test name (#3534 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-14 17:09:50 +08:00
xinhe-nv	b1d8495b3d	update waive list (#3510 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-14 15:24:48 +08:00

... 11 12 13 14 15 ...

1322 Commits