TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 04:03:22 +08:00

Author	SHA1	Message	Date
Emma Qiao	ca9da1f1c2	[None][infra] Skip failed cases for main (#8176 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-07 06:37:51 -07:00
xiweny	9298f1bdcc	[None] [test] Add B300 cases to CI (#8056 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-06 19:23:31 -07:00
Faraz	27a5091fcb	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Vincent Huang <vincenth@nvidia.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Vincent Huang <vincenth@nvidia.com>	2025-10-06 16:59:06 -04:00
Lucas Liebenwein	3492391feb	[None][chore] AutoDeploy: clean up accuracy test configs (#8134 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-06 12:51:01 -07:00
Yan Chunwei	fb51de6c2e	[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (#5543 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: chunweiy <chunweiy@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>	2025-10-05 17:28:20 +08:00
Jonas Yang CN	88ea2c4ee9	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-10-04 08:12:24 +08:00
Lucas Liebenwein	2c454e8003	[None][feat] AutoDeploy: Nemotron-H accuracy test (#8133 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-03 15:39:03 -07:00
Michal Guzek	38da871db3	[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list (#6806 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-10-03 12:10:48 -07:00
Mike Iovine	ca8291133a	[None][fix] Fix MTP 2-model (#8115 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-10-03 10:13:50 -07:00
Patrice Castonguay	b77f19f4ff	[https://nvbugs/5434320 ][fix] fix: Unwaiving disagg pp tests (#8069 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-01 00:33:59 -04:00
Emma Qiao	b1e3fef8aa	[None][infra] Skip failed tests in post-merge for main (#8102 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-01 10:12:10 +08:00
brb-nv	84aa3c981e	[None][chore] Waive failing MNNVL alltoall multi-gpu test (#8106 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-09-30 20:05:42 -04:00
xinhe-nv	1dba9fa89e	[TRTLLM-6239][feat] add test cases into QA test list (#8081 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-30 00:23:45 -04:00
Kaiyu Xie	b0cb9ca50e	[None] [test] Add MNNVL AlltoAll tests to pre-merge (#7466 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-09-29 23:12:24 -04:00
Cheng Hang	cdce68c3e0	[TRTLLM-6741][fix] Add heuristics for lm head tp size when `enable_lm_head_tp_in_adp=True` (#7891 ) Signed-off-by: Cheng Hang <chang@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-30 09:24:35 +08:00
xiweny	48e779ae8c	[https://nvbugs/5541494 ] [fix] add back missing sm100f bmm kernels (#8051 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-29 05:35:44 -04:00
xinhe-nv	20e6cd39f1	[None][chore] Add failed cases into waives.txt (#8043 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-29 03:37:39 -04:00
Emma Qiao	ce381d6813	[None][infra] Waive failed cases for main on 0929 (#8053 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-29 02:46:02 -04:00
HuiGao-NV	7ac932d45e	[https://nvbugs/5532087 ][CI] Enable test case (#8029 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-29 01:46:28 -04:00
Eran Geva	9cea6bfb30	[#7288 ][feat] Added AutoDeploy backend support to test_perf.py (#7588 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-28 21:21:27 -07:00
Emma Qiao	2be05cbd6e	[None][infra] Skip failed test for main branch on 9/28 (#8040 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-28 07:00:55 -04:00
ChristinaZ	95eac2cda7	[https://nvbugs/5537738 ][fix] Add fp8 post-quant allgather support (#8008 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-28 15:32:45 +08:00
Iman Tabrizian	33282351a2	[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-27 19:29:30 -04:00
Emma Qiao	c8bef27ebb	[None][infra] Waive failed cases in post-merge 2305 (#8019 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-26 10:20:12 -07:00
xinhe-nv	ba6ab62bd1	[None][chore] Add failed cases into waives.txt (#8004 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-26 00:41:02 -07:00
xinhe-nv	f32f5730b2	[None][chore] Add failed cases into waives.txt (#7986 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-25 23:50:09 -07:00
Lucas Liebenwein	3a96d75a3c	[https://nvbugs/5527956 ][fix] AutoDeploy: fix IMA due to outdated metadata (#8002 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-25 22:05:55 -07:00
Yiqing Yan	108248ece1	[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-26 09:59:11 +08:00
Emma Qiao	2dc93c6371	[None][infra] Waive failed tests on main (#8001 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 08:13:39 -07:00
Yan Chunwei	5342c607cd	[https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case (#7717 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
xinhe-nv	e30d9aced9	[https://nvbugs/4955671 ][fix] update test list (#7980 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-25 02:58:09 -07:00
Emma Qiao	cb53261aaf	[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 01:03:13 -07:00
fredricz-20070104	0945403174	[TRTLLM-6541][test] Add NIM perf test cases (#7924 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-25 13:15:26 +08:00
Iman Tabrizian	be7e51727e	[https://nvbugs/5456485 ][bug] unwaive triton test (#7966 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 17:02:55 -07:00
Pamela Peng	b1dc84b4a3	[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-24 11:40:26 -04:00
HuiGao-NV	c8bda4b3a9	[None][ci] Waive some intermittent failures (#7955 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 19:00:38 +08:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
xinhe-nv	b8bfa63197	[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 03:25:17 -07:00
QI JUN	18ff1e31b8	[None][ci] remove duplicate test cases (#7956 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 17:47:22 +08:00
yufeiwu-nv	f323b74d42	[None][test] Update llm_models_root to improve path handling on BareMetal environment (#7876 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-24 17:35:57 +08:00
HuiGao-NV	29e63d3bc2	[https://nvbugs/5532248 ][fix] Fix fused_moe OOM (#7931 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 02:22:38 -07:00
QI JUN	946ffcd2eb	[None][ci] optimize test cases of dgx b200 (#7948 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 00:39:45 -07:00
Pengbo Wang	b890d7fea4	[None][infra] Skip failed test for nvbugs 5537738 (#7946 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 23:48:50 -07:00
Yueh-Ting (eop) Chen	cf100933cc	[TRTLLM-6341][feature] Support SWA KV cache reuse (#6768 ) This merge request attempts to support more SWA KV cache functionality inside the KV cache manager. Before this merge request, the KV cache for sliding window attention (SWA) only holds "window size" number of blocks and reuse them in a cyclic manner. We will not be able to utilize more GPU memory with this design, leading to a limited max batch size throughput. Additionally, we will not be able to support KV cache reuse with this design. In this MR, we change such behavior to let the manager write blocks in a linear manner. With a linear block writing behavior, as the attention window moves on, the out-of-window (OOW) blocks will be detached. Right now for the sake of a correct feature first, we directly offload the OOW block from the primary block pool (GPU memory) to the secondary block pool (host memory). We will improve this in the future by delegating the block movement to the eviction policy. KV cache reuse for SWA is not developed in this merge request and will be amended in a follow-up merge request. Writing the blocks linearly, the maximum number of blocks allocated for a sequence(`GenerationRequest`) is the "max sequence length" specified. The `GenerationRequest` that stores the cache block bookkeeping structure will now keep "max sequence length" tokens of blocks. Given the above, main changes are (more context in the MR): - Remove "cyclic" concept under the kv cache manager, such concept originally guards the block reuse under kv cache manager. - Add detach mechanism and have it under `KVCacheManager::addToken`. Please note that detach is still guarded off for SWA when reuse is enabled. A follow-up merge request will proceed to improve this. - Enforce "max sequence length" to be a non-optional parameter to the `KVCacheManager`/`BlockManager` - Let all window size resource pool get identical proportion of memory - Fix free memory calculation under `resource_manager.py` Signed-off-by: eopXD <yuehtingc@nvidia.com> Co-authored-by: Tomer Asida <tasida@nvidia.com>	2025-09-24 14:28:24 +08:00
Lizhi Zhou	e4f1f90202	[https://nvbugs/5477404 ][chore] unwaive test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity (#7857 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 10:31:35 +08:00
Lizhi Zhou	7550251988	[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 08:31:56 +08:00
Zheng Duan	e3c1a9409f	[TRTLLM-6549][fix] add kv cache time output back (#7798 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-09-23 14:12:42 -04:00
Yanchao Lu	6a36349964	[None][test] Waive another intermittent OOM test (#7930 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-23 22:34:09 +08:00
ruodil	05bec3bf0f	[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-09-22 23:04:34 -07:00
Pengbo Wang	a4b4ed4535	[None][fix] Fix and add test for TRTLLM MoE backend (#7755 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 11:26:25 +08:00
Pengbo Wang	08cc7a041f	[https://nvbugs/5355128 ][fix] Add missing wgmma intrinsic for starcoder (#7643 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 10:38:58 +08:00
yunruis	126cd707e3	[None][opt] Add batch waiting when scheduling (#7416 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-23 10:27:37 +08:00
Enwei Zhu	59f57598a7	[https://nvbugs/5504086 ][fix] Fix MTP vanilla (#7904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 08:38:28 +08:00
Jin Li	b5391b4ac6	[https://nvbugs/5516665 ][fix] Fix CUTLASS moe fake impl errors (#7714 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-22 11:08:39 -07:00
Emma Qiao	324301ccba	[None][infra] Skip failed test for nvbugs 5532023 (#7905 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-22 03:49:44 -07:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
Bo Deng	8cf95681e6	[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-22 16:43:35 +08:00
Emma Qiao	d330d0005c	[None][infra] Waive a failed case on main (#7901 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-22 00:37:01 -07:00
xinhe-nv	9c1b75e978	[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-22 00:12:43 -07:00
Yi Zhang	f9c9c3f50a	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
bhsueh_NV	ef557f880b	[https://nvbugs/5437405 ][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yanchao Lu	5c8b022d1e	[None][ci] Test waives for the release/1.0 branch 09/15 (#7700 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
peaceh-nv	541b7fda89	[https://nvbugs/5503423 ][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yuxian Qiu	2d46dda6a7	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
peaceh-nv	9dc7316b7f	[https://nvbugs/5512556 ][unwaive] Unwaive DeepSeek PP tests (#7828 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-22 10:26:30 +08:00
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
Mike Iovine	8030b540ac	[https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access (#7845 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-19 10:30:37 -04:00
pcastonguay	fbe325ce57	[https://nvbugs/5471108 ][chore] Unwaiving disagg acc test (#7686 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-09-19 08:56:09 -04:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
xinhe-nv	efb763402f	[None][chore] Add failed cases into waives.txt (#7841 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-19 17:59:47 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
dominicshanshan	451475e0dc	[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . (#7853 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-19 14:54:59 +08:00
Emma Qiao	ea079fa530	[None][infra] Waive failed tests in post-merge (#7859 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-19 14:16:12 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
xinhe-nv	d3a907131a	[https://nvbugs/5519462 ][fix] Add failed cases into waives.txt (#7817 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 20:01:06 +08:00
Wanli Jiang	fe104dc20d	[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 17:37:16 +08:00
xinhe-nv	d909f80379	[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 17:13:07 +08:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
xinhe-nv	236f71ea05	[None][chore] Add failed cases into waives.txt (#7801 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 14:48:16 +08:00
Ivy Zhang	26d50eb539	[TRTLLM-8070][test] add generation logits case for llama3 (#7759 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-18 13:33:16 +08:00
Yan Chunwei	327e5e5eed	[None][ci] restore unwaive list (#7802 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-18 10:50:34 +08:00
Netanel Haber	a5cfc8368f	[https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async (#7041 ) (#7796 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-09-17 21:27:01 -04:00
yunruis	7c03eb9ea2	[https://nvbugs/5516661 ][fix] Drop waive case 5516661 (#7791 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-09-18 08:55:32 +08:00
Emma Qiao	c4abca323e	[None][infra] Waive failed tests on main (#7812 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-17 23:44:36 +08:00
William Zhang	2614d71994	[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-17 08:11:16 -07:00
xinhe-nv	f918302b3a	[TRTLLM-7250][fix] waive block tests (#7782 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:31:03 +08:00
ruodil	e6073b3911	[None][test] add gpt oss model for trtllm perf test (#7328 ) Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-09-17 15:23:21 +08:00
xinhe-nv	7801d0992b	[None][chore] Remove closed bugs (#7697 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:14:09 +08:00
QI JUN	bd7aad4988	[None][ci] waive test_llm_gemma_1gpu_summary_vswa (#7781 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 10:48:31 +08:00
HuiGao-NV	a49cfb3e68	[https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com> Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-17 06:24:20 +08:00
xinhe-nv	e7c1569456	[None][chore] Add failed cases into waives.txt (#7746 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 18:43:40 +08:00
Ziyi Xiong	905bb26bbd	[https://nvbugs/5471106 ][fix] Remove the waivers (#7711 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 17:43:39 +08:00
xinhe-nv	c6ab2072b5	[None][fix] waive hang tests on main (#7720 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 17:05:15 +08:00
xinhe-nv	1fbea497ff	[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 16:39:31 +08:00
xinhe-nv	cf55927064	[None][chore] Add failed cases into waives.txt (#7735 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 10:58:06 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
QI JUN	44d5ccfdd9	[None][ci] move qwen3 tests from GB200 to B200 (#7733 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-16 08:12:28 +08:00
Yanchao Lu	0c9430e5a5	[None][ci] Test waives for the main branch 09/15 (#7709 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-15 22:13:56 +08:00
jmydurant	7deefb3d2b	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-09-15 21:43:49 +08:00
HuiGao-NV	335c007df8	[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-15 15:37:58 +08:00
Ivy Zhang	ddfe0320b3	[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-15 13:38:52 +08:00
xinhe-nv	b69e3e9f99	[None][chore] Add failed cases into waives.txt (#7682 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-15 11:44:52 +08:00
Chang Liu	47e37755a3	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-14 20:10:10 -07:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
Yanchao Lu	70aa4e28c1	[None][ci] Test waives for the main branch 09/14 (#7698 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-14 23:48:04 +08:00
Pengyun Lin	c2bc39af63	[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-12 15:32:34 +08:00
QI JUN	656f229b58	[None][ci] move some test cases from l40s to a30 (#7684 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-11 07:22:34 +08:00
Emma Qiao	9986070044	[None][infra] Waive failed cases on main 0910 (#7676 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-11 01:43:29 +08:00
Dom Brown	fc9d426589	[https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-10 18:30:48 +01:00
nvamyt	222e01662c	[https://nvbugs/5488212 ][waive] Waive failed tests for L20 (#7664 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-10 22:32:15 +08:00
xinhe-nv	207c5258c4	[https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell (#7505 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-10 21:09:27 +08:00
Bo Deng	bf57829acf	[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-10 17:35:51 +08:00
fredricz-20070104	ef620f3579	[https://nvbugs/5410687 ][test] Add deepseek r1-w4afp8 quickstart (#7645 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-10 10:21:01 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
QI JUN	a0e1604898	[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-09 11:06:32 -04:00
Liao Lanyu	af403848d7	[https://nvbugs/5445466 ][fix] unwaive DS R1 test cases with bug already fixed (#7429 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-09 17:25:49 +08:00
Perkz Zheng	da6cb541a2	[None][feat] Optimize MLA kernels with separate reduction kernels (#7597 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-09 16:58:44 +08:00
xinhe-nv	8a52015f50	[None][chore] Remove closed bugs (#7591 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-09 04:08:42 -04:00
Yiqing Yan	5c616da2fd	[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-09 11:33:09 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
Iman Tabrizian	d96c54d8ae	[None][test] Skip eagle3 test (#7627 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-08 17:23:53 -04:00
dongfengy	fdd5bd49fc	[https://nvbugs/5481080 ][fix] Fix GPTOSS W4A16 reference (#7323 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-08 13:59:28 -07:00
Chuang Zhu	77657a1c12	[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-08 13:37:46 -04:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00
Raayan Dhar	bae9560e62	[https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks (#7455 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-07 08:45:49 -04:00
dominicshanshan	9a97f0a3b7	[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 (#7585 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-06 21:29:16 +08:00
QI JUN	525bb806a9	[None][ci] move some test cases of DGX H100 to post merge (#7569 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-06 01:03:38 -04:00
Emma Qiao	d8ec546b73	[None][infra] Waive failed tests on main branch 0905 (#7564 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-05 22:46:46 +08:00
xinhe-nv	8e3962d278	[TRTLLM-6642][feat] add gptoss 20g tests (#7361 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:20:28 -04:00
xinhe-nv	b3ba3d98d2	[None][chore] Remove closed bugs (#7408 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:11:16 -04:00
QI JUN	ff3704897b	[None][ci] remove unnecessary test_modeling_deepseek.py (#7542 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-04 20:05:27 -07:00
Jin Li	2189a2f3ff	[https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… (#7441 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-05 10:56:21 +08:00
Ivy Zhang	b46e0ae5d4	[None][test] update nim and full test list (#7468 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-04 09:06:01 -04:00
Jin Li	2a2dfe273b	[https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… (#7442 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 10:48:15 +08:00
Stanley Sun	db8eb0a447	[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-04 10:34:38 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Stanley Sun	cebbf48b74	[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-03 08:36:52 -04:00
Mike Iovine	79d93f9419	[https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 (#7486 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-03 14:10:40 +08:00
Wanli Jiang	4223a9aada	[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-03 10:27:42 +08:00
Simeng Liu	bcc55bcdf3	[https://nvbugs/5470782 ][fix] Add specific test names for test_deepseek.py (#7318 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-09-02 10:31:40 -07:00
Emma Qiao	aae5d22bfe	[None][infra] Waive failed tests on main branch 0902 (#7482 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-02 10:16:49 -04:00
peaceh-nv	90479c50fb	[https://nvbugs/5453992 ][unwaive] Unwaive llama quickstart test (#7242 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-02 20:28:32 +08:00
JunyiXu-nv	eefe5f2093	[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-09-02 07:08:22 -04:00
HuiGao-NV	7279297717	[None][infra] waive test case failed on post-merge (#7471 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-02 06:20:08 -04:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
Yan Chunwei	f90375f37c	[https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus (#7454 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-02 04:17:14 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Yiqing Yan	21291f3d8e	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	d5bc5cd4f2	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	ac07418968	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	3e99744201	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	7841ea6255	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
QI JUN	baef70e67e	[None][ci] move qwen3 tests from b200 to gb200 (#7257 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-26 11:50:53 -04:00
xinhe-nv	80043affb5	[None][chore] Add failed cases into waives.txt (#7251 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 17:13:44 +08:00
Zheng Duan	cf50ba2980	[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 15:34:44 +08:00
Zheng Duan	1a929a1490	[https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests (#7028 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 14:25:10 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
William Zhang	92576488d3	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-25 23:56:21 -04:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00
Emma Qiao	200db3b809	[None][infra] Waive failed tests on main branch (#7201 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-25 09:04:37 -04:00
Ivy Zhang	f61b74f796	[None][test] add l20 specific qa test list (#7067 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-25 12:44:08 +08:00
Bo Deng	c038fb3ef4	[None][chore] cherry-pick 6940 (#7097 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-25 10:28:45 +08:00
xinhe-nv	3ba9afcc7b	[None][feat] add gpt-osss tests to sanity list (#7158 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-25 10:22:07 +08:00
Yiqing Yan	486bc763c3	[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:09:04 -04:00
Robin Kobus	31979aefac	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-24 20:53:17 +02:00
ajrasane	068056677f	[None][chore] Enable auto deploy accuracy test in CI (#7179 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-24 08:42:30 -07:00
Yanchao Lu	ec35481b0a	[None][infra] Prepare for single GPU GB200 test pipeline (#7073 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:46:39 +08:00
dongxuy04	19a0ea363b	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: Dongxu Yang <dongxuy@nvidia.com> Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-08-24 08:15:29 -04:00
Iman Tabrizian	96ff82e77a	[None][fix] Waive test (#7185 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-24 10:45:11 +08:00
QI JUN	1388e84793	[None][ci] move all B200 TensorRT test cases to post merge (#7165 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-22 06:47:23 -04:00

... 2 3 4 5 6 ...

980 Commits