TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-26 05:32:57 +08:00

Author	SHA1	Message	Date
Lizhi Zhou	3f82cdbdad	[https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue (#8519 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-23 09:43:59 +08:00
Pengyun Lin	e86d6db9ec	[https://nvbugs/5575829 ][fix] Unwaive gpt-oss test (#8576 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-22 07:31:56 -04:00
Emma Qiao	09349ccbfe	[None][infra] Waive failed tests for release 10/22 (#8574 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-22 04:41:00 -04:00
Bo Deng	9e30f14da8	[https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… (#8500 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-10-22 14:59:59 +08:00
JunyiXu-nv	0acdecb2c3	[https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 (#8429 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-10-21 12:37:56 -04:00
mpikulski	f256eb9063	[TRTLLM-8650][fix] beam search request validation (#8433 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-21 10:50:27 +02:00
Emma Qiao	2b0a10e4d5	[None][infra] Waive tests for release 1021 (#8522 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-21 03:21:00 -04:00
bhsueh_NV	14d0f5d683	[https://nvbugs/5516666 ][fix] cherry-pick PR 8130 to unwaive the Qwen3 CI (#8444 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-10-19 23:14:10 -04:00
Yan Chunwei	995b93bc38	[https://nvbugs/5437384 ][test] fix trtllm-llmapi-launch multi tests with single launch (#8397 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-10-16 21:14:43 -07:00
Patrice Castonguay	7862372ee2	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-16 09:11:04 +08:00
Ivy Zhang	4751bdbcb6	[None][chore] Update nim test list (#8356 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-10-15 02:04:20 -07:00
Emma Qiao	988f93790f	[None][infra] Waive failed tests in release post-merge 10/15 (#8386 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-15 16:06:08 +08:00
Stanley Sun	cce97e6e15	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-10-15 15:09:21 +08:00
Yiqing Yan	7b5ba7ca66	[https://nvbugs/5565541 ][fix] Add timeout threshold for H100 FHMA test (#8354 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-10-14 01:23:08 -07:00
Chuang Zhu	6a73f079fe	[https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading (#8297 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-14 07:55:31 +02:00
Lizhi Zhou	2c44e8198a	[https://nvbugs/5470769 ][chore] unwaive test for PR7338 (#8258 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-14 11:17:03 +08:00
William Zhang	dc052b663f	[https://nvbugs/5565530 ][fix] Unwaive test (#8273 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-10-13 17:59:32 +02:00
Chuang Zhu	ad0e91a174	[https://nvbugs/5546202 ][fix] Fix concurrent bug for NIXL cache transceiver (#8147 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-13 09:40:56 +02:00
Ivy Zhang	6a42a9649b	[None][chore] Update test configs for release (#8224 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 14:07:33 +08:00
Liao Lanyu	8f2e48a981	[https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting (#8268 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-10-13 13:31:52 +08:00
Ivy Zhang	bcf9cb1f58	[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:38:38 +08:00
Emma Qiao	d857cd47a0	[None][infra] Update and waive failed tests for release branch (#8291 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-12 21:51:54 +08:00
Yan Chunwei	4ebc443fa9	[https://nvbugs/5565590 ][fix] test_request_perf_metrics_draft (#8257 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-12 10:01:20 +08:00
Yan Chunwei	7771669651	[https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests (#8251 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-11 10:43:04 +08:00
brb-nv	a9a0969de7	[None][chore] Waive tests failing on release/1.1 post merge (#8185 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-10-08 09:59:50 -07:00
Yukun He	1ca84e1a25	[https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-07 23:47:00 -07:00
xiweny	72144a40d2	[https://nvbugs/5541494 ] [fix] Fix missing sm100f/103a kernels and add tests (#8098 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-07 08:27:55 +08:00
Jin Li	ef8e2173d4	[None][ci] Waive failing tests on release/1.1 (#8088 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-30 04:10:22 -04:00
Yiqing Yan	108248ece1	[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-26 09:59:11 +08:00
Emma Qiao	2dc93c6371	[None][infra] Waive failed tests on main (#8001 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 08:13:39 -07:00
Yan Chunwei	5342c607cd	[https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case (#7717 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
xinhe-nv	e30d9aced9	[https://nvbugs/4955671 ][fix] update test list (#7980 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-25 02:58:09 -07:00
Emma Qiao	cb53261aaf	[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 01:03:13 -07:00
fredricz-20070104	0945403174	[TRTLLM-6541][test] Add NIM perf test cases (#7924 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-25 13:15:26 +08:00
Iman Tabrizian	be7e51727e	[https://nvbugs/5456485 ][bug] unwaive triton test (#7966 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 17:02:55 -07:00
Pamela Peng	b1dc84b4a3	[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-24 11:40:26 -04:00
HuiGao-NV	c8bda4b3a9	[None][ci] Waive some intermittent failures (#7955 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 19:00:38 +08:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
xinhe-nv	b8bfa63197	[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 03:25:17 -07:00
QI JUN	18ff1e31b8	[None][ci] remove duplicate test cases (#7956 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 17:47:22 +08:00
yufeiwu-nv	f323b74d42	[None][test] Update llm_models_root to improve path handling on BareMetal environment (#7876 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-24 17:35:57 +08:00
HuiGao-NV	29e63d3bc2	[https://nvbugs/5532248 ][fix] Fix fused_moe OOM (#7931 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 02:22:38 -07:00
QI JUN	946ffcd2eb	[None][ci] optimize test cases of dgx b200 (#7948 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 00:39:45 -07:00
Pengbo Wang	b890d7fea4	[None][infra] Skip failed test for nvbugs 5537738 (#7946 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 23:48:50 -07:00
Yueh-Ting (eop) Chen	cf100933cc	[TRTLLM-6341][feature] Support SWA KV cache reuse (#6768 ) This merge request attempts to support more SWA KV cache functionality inside the KV cache manager. Before this merge request, the KV cache for sliding window attention (SWA) only holds "window size" number of blocks and reuse them in a cyclic manner. We will not be able to utilize more GPU memory with this design, leading to a limited max batch size throughput. Additionally, we will not be able to support KV cache reuse with this design. In this MR, we change such behavior to let the manager write blocks in a linear manner. With a linear block writing behavior, as the attention window moves on, the out-of-window (OOW) blocks will be detached. Right now for the sake of a correct feature first, we directly offload the OOW block from the primary block pool (GPU memory) to the secondary block pool (host memory). We will improve this in the future by delegating the block movement to the eviction policy. KV cache reuse for SWA is not developed in this merge request and will be amended in a follow-up merge request. Writing the blocks linearly, the maximum number of blocks allocated for a sequence(`GenerationRequest`) is the "max sequence length" specified. The `GenerationRequest` that stores the cache block bookkeeping structure will now keep "max sequence length" tokens of blocks. Given the above, main changes are (more context in the MR): - Remove "cyclic" concept under the kv cache manager, such concept originally guards the block reuse under kv cache manager. - Add detach mechanism and have it under `KVCacheManager::addToken`. Please note that detach is still guarded off for SWA when reuse is enabled. A follow-up merge request will proceed to improve this. - Enforce "max sequence length" to be a non-optional parameter to the `KVCacheManager`/`BlockManager` - Let all window size resource pool get identical proportion of memory - Fix free memory calculation under `resource_manager.py` Signed-off-by: eopXD <yuehtingc@nvidia.com> Co-authored-by: Tomer Asida <tasida@nvidia.com>	2025-09-24 14:28:24 +08:00
Lizhi Zhou	e4f1f90202	[https://nvbugs/5477404 ][chore] unwaive test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity (#7857 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 10:31:35 +08:00
Lizhi Zhou	7550251988	[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 08:31:56 +08:00
Zheng Duan	e3c1a9409f	[TRTLLM-6549][fix] add kv cache time output back (#7798 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-09-23 14:12:42 -04:00
Yanchao Lu	6a36349964	[None][test] Waive another intermittent OOM test (#7930 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-23 22:34:09 +08:00
ruodil	05bec3bf0f	[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-09-22 23:04:34 -07:00

1 2 3 4 5 ...

831 Commits