TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
xinhe-nv	1dba9fa89e	[TRTLLM-6239][feat] add test cases into QA test list (#8081 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-30 00:23:45 -04:00
Cheng Hang	cdce68c3e0	[TRTLLM-6741][fix] Add heuristics for lm head tp size when `enable_lm_head_tp_in_adp=True` (#7891 ) Signed-off-by: Cheng Hang <chang@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-30 09:24:35 +08:00
Iman Tabrizian	33282351a2	[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-27 19:29:30 -04:00
xinhe-nv	e30d9aced9	[https://nvbugs/4955671 ][fix] update test list (#7980 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-25 02:58:09 -07:00
fredricz-20070104	0945403174	[TRTLLM-6541][test] Add NIM perf test cases (#7924 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-25 13:15:26 +08:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
xinhe-nv	b8bfa63197	[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 03:25:17 -07:00
Lizhi Zhou	7550251988	[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 08:31:56 +08:00
ruodil	05bec3bf0f	[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-09-22 23:04:34 -07:00
yunruis	126cd707e3	[None][opt] Add batch waiting when scheduling (#7416 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-23 10:27:37 +08:00
xinhe-nv	9c1b75e978	[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-22 00:12:43 -07:00
Yi Zhang	f9c9c3f50a	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
Wanli Jiang	fe104dc20d	[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 17:37:16 +08:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
Ivy Zhang	26d50eb539	[TRTLLM-8070][test] add generation logits case for llama3 (#7759 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-18 13:33:16 +08:00
William Zhang	2614d71994	[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-17 08:11:16 -07:00
ruodil	e6073b3911	[None][test] add gpt oss model for trtllm perf test (#7328 ) Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-09-17 15:23:21 +08:00
HuiGao-NV	a49cfb3e68	[https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com> Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-17 06:24:20 +08:00
xinhe-nv	1fbea497ff	[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 16:39:31 +08:00
Ivy Zhang	ddfe0320b3	[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-15 13:38:52 +08:00
Chang Liu	47e37755a3	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-14 20:10:10 -07:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
xinhe-nv	207c5258c4	[https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell (#7505 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-10 21:09:27 +08:00
Bo Deng	bf57829acf	[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-10 17:35:51 +08:00
fredricz-20070104	ef620f3579	[https://nvbugs/5410687 ][test] Add deepseek r1-w4afp8 quickstart (#7645 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-10 10:21:01 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
xinhe-nv	8e3962d278	[TRTLLM-6642][feat] add gptoss 20g tests (#7361 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:20:28 -04:00
Ivy Zhang	b46e0ae5d4	[None][test] update nim and full test list (#7468 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-04 09:06:01 -04:00
Stanley Sun	db8eb0a447	[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-04 10:34:38 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Stanley Sun	cebbf48b74	[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-03 08:36:52 -04:00
Wanli Jiang	4223a9aada	[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-03 10:27:42 +08:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00

1 2 3 4 5

249 Commits