TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Wanli Jiang	6640aed0c2	[None][fix] Bypass key-word matching for multimodal tests (#9170 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-18 10:33:07 +08:00
brb-nv	6d28e6c3a6	[https://nvbugs/5568836 ][fix] Skip keyword matching for Gemma3 e2e test (#9158 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-14 02:18:24 -08:00
peaceh-nv	f1d02b5664	[https://nvbugs/5570575 ][fix] : Use less kv cache memory on SM120 (#9054 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-11-11 15:42:08 +08:00
Lizhi Zhou	0649b77d16	[https://nvbugs/5608743 ][chore] unwaive test (#8994 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-10 05:59:29 -08:00
dominicshanshan	def2ad5107	[https://nvbugs/5575920 ][fix] Fix cublas/cublasLt handle creation memory not sufficient error (#8900 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-11-07 10:14:00 -08:00
Ivy Zhang	5cf3f0c981	[https://nvbugs/5636946 ][fix] Update test model (#8993 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-07 15:13:29 +08:00
Bo Deng	43843778a7	[https://nvbugs/5601682 ][fix] unwaive test_disaggregated_deepseek_v3_… (#8888 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-11-05 09:33:57 +08:00
xiweny	7d8a913406	[https://nvbugs/5596343 ] [test] Update accuracy baseline for GPT-OSS-20B (#8842 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com> Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-04 16:04:11 +08:00
brb-nv	095b7a3ad5	[https://nvbugs/5521253 ][fix] Enable Gemma3 12B & 27B on SM100 (#8666 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-03 14:49:36 -08:00
Barry Kang	f22a87f296	[https://nvbugs/5325296 ][fix] Enable relaxed acceptance test on Blackwell (#8709 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-10-31 15:02:06 -07:00
Jin Li	28673f3e9c	[https://nvbugs/5488118 ][fix] Unwaive passed tests (#8758 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-31 10:46:44 +08:00
xiweny	f49f42db59	[https://nvbugs/5601203 ] [fix]Restrict fp8 blockscale moe case (#8583 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-29 10:47:32 +08:00
Yukun He	e04354bc09	[https://nvbugs/5608489 ][fix] Fix output unpack issues for Llama3/4 NVFP4 models. (#8679 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-28 14:21:47 +08:00
Ivy Zhang	1859b55d22	[None][test] Clean cache for certain easily hang cases (#8619 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-10-24 08:17:32 -04:00
Jie Li	4b52054bdd	[https://nvbugs/5541145 ][fix] Remove DeepSeekR1 test case from H20 to prevent OOM (#8610 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2025-10-24 05:20:40 -04:00
Lizhi Zhou	686298d2d5	[https://nvbugs/5575902 ][fix] set max_batch_size=1 to stabilize accuracy test result (#8609 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-23 07:28:29 -07:00
Ivy Zhang	5d27034295	[TRTLLM-8785][fix] create output_dir before test begin (cherry-pick #8518 ) (#8575 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-23 04:41:54 -04:00
Chang Liu	e5b6d335eb	[https://nvbugs/5568961 ][fix] Fix a merge conflict (cherrypick from PR 8365) (#8553 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-23 14:05:16 +08:00
Lizhi Zhou	3f82cdbdad	[https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue (#8519 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-23 09:43:59 +08:00
Bo Deng	9e30f14da8	[https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… (#8500 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-10-22 14:59:59 +08:00
Ivy Zhang	f904348cd6	[TRTLLM-8580][test] save runtime report periodically (#8312 ) (#8455 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-20 10:54:24 +08:00
Yukun He	437a3fc642	[None][chore] Remove duplicate log outputs in test_perf.py (#8418 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-17 14:11:32 +08:00
ruodil	20c2de4924	[None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-10-15 23:26:32 -07:00
Patrice Castonguay	7862372ee2	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-16 09:11:04 +08:00
Stanley Sun	cce97e6e15	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-10-15 15:09:21 +08:00
xiweny	d5b79268e7	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-15 10:17:08 +08:00
bhsueh_NV	66aa88739b	[https://nvbugs/5574556 ][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-10-14 15:26:15 +08:00
Lizhi Zhou	553ff3402a	[https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure (#8307 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-14 08:01:00 +02:00
Chuang Zhu	6a73f079fe	[https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading (#8297 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-14 07:55:31 +02:00
Enwei Zhu	598e88594c	[https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests (#8311 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-10-13 18:55:28 +08:00
Chuang Zhu	ad0e91a174	[https://nvbugs/5546202 ][fix] Fix concurrent bug for NIXL cache transceiver (#8147 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-13 09:40:56 +02:00
Ivy Zhang	6a42a9649b	[None][chore] Update test configs for release (#8224 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 14:07:33 +08:00
Ivy Zhang	bca5e29387	[None][chore] Update constaintfor release (#8211 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:14:24 +08:00
Yukun He	1ca84e1a25	[https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-07 23:47:00 -07:00
Enwei Zhu	d650320de4	[None][infra] Improve the failure message for accuracy test suite (#7994 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-26 10:04:47 +08:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	9f0f52249e	[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Chuang Zhu	791e73edf6	[https://nvbugs/5536141 ][fix] fix_disagg_single_gpu_test (#7990 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-25 02:07:22 -07:00
Mike Iovine	42c2ec3239	[https://nvbugs/5473781 ][fix] Fix llama 4 FP8 for PP>1 (#7220 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-24 12:16:27 -04:00
Pamela Peng	b1dc84b4a3	[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-24 11:40:26 -04:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
xinhe-nv	b8bfa63197	[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 03:25:17 -07:00
yufeiwu-nv	f323b74d42	[None][test] Update llm_models_root to improve path handling on BareMetal environment (#7876 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-24 17:35:57 +08:00
xinhe-nv	62563760fb	[None][chore] update chunked prefill cases (#7921 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 15:14:49 +08:00
Yueh-Ting (eop) Chen	cf100933cc	[TRTLLM-6341][feature] Support SWA KV cache reuse (#6768 ) This merge request attempts to support more SWA KV cache functionality inside the KV cache manager. Before this merge request, the KV cache for sliding window attention (SWA) only holds "window size" number of blocks and reuse them in a cyclic manner. We will not be able to utilize more GPU memory with this design, leading to a limited max batch size throughput. Additionally, we will not be able to support KV cache reuse with this design. In this MR, we change such behavior to let the manager write blocks in a linear manner. With a linear block writing behavior, as the attention window moves on, the out-of-window (OOW) blocks will be detached. Right now for the sake of a correct feature first, we directly offload the OOW block from the primary block pool (GPU memory) to the secondary block pool (host memory). We will improve this in the future by delegating the block movement to the eviction policy. KV cache reuse for SWA is not developed in this merge request and will be amended in a follow-up merge request. Writing the blocks linearly, the maximum number of blocks allocated for a sequence(`GenerationRequest`) is the "max sequence length" specified. The `GenerationRequest` that stores the cache block bookkeeping structure will now keep "max sequence length" tokens of blocks. Given the above, main changes are (more context in the MR): - Remove "cyclic" concept under the kv cache manager, such concept originally guards the block reuse under kv cache manager. - Add detach mechanism and have it under `KVCacheManager::addToken`. Please note that detach is still guarded off for SWA when reuse is enabled. A follow-up merge request will proceed to improve this. - Enforce "max sequence length" to be a non-optional parameter to the `KVCacheManager`/`BlockManager` - Let all window size resource pool get identical proportion of memory - Fix free memory calculation under `resource_manager.py` Signed-off-by: eopXD <yuehtingc@nvidia.com> Co-authored-by: Tomer Asida <tasida@nvidia.com>	2025-09-24 14:28:24 +08:00
Lizhi Zhou	7550251988	[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 08:31:56 +08:00
Zheng Duan	e3c1a9409f	[TRTLLM-6549][fix] add kv cache time output back (#7798 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-09-23 14:12:42 -04:00
Pengbo Wang	a4b4ed4535	[None][fix] Fix and add test for TRTLLM MoE backend (#7755 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 11:26:25 +08:00
yunruis	126cd707e3	[None][opt] Add batch waiting when scheduling (#7416 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-23 10:27:37 +08:00
xinhe-nv	9c1b75e978	[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-22 00:12:43 -07:00

1 2 3 4 5 ...

676 Commits