TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-25 21:22:57 +08:00

Author	SHA1	Message	Date
Emma Qiao	7409d56053	[None][infra] Waive failed cases for release/1.0 (#7258 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-26 19:50:28 +08:00
Ivy Zhang	1f7a1645d6	[None][fix] update skip case (#7193 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 12:31:48 +08:00
ruodil	ebbbacf81c	[None][test] add kv cache size in bench metric and fix failed cases (#7211 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-26 10:09:22 +08:00
Dom Brown	3f2eb4d2e8	[https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-23 05:26:12 -04:00
HuiGao-NV	253af9f9af	[https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta (#6557 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-08-22 13:19:27 +08:00
Pamela Peng	1e5a6be55d	[https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 (#7010 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-08-21 13:34:07 -04:00
Yan Chunwei	e77ec061db	[https://nvbugs/5451296 ][fix] zmq nonblock bug with retry (#7019 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:34:46 +08:00
yifeizhang-c	5959d72d74	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6975 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-20 16:32:27 +08:00
Jin Li	69846c6586	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6978 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-20 15:06:56 +08:00
Bo Deng	df00c81aea	[https://nvbugs/5448437 ][fix] fix some nixl tests (#6940 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 14:19:48 +08:00
Emma Qiao	c4535e6c3a	[None][infra] Waive failed tests for release branch (#7036 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 20:42:47 +08:00
William Zhang	790a105563	[https://nvbugs/5462007 ][ci] Unwaive Mistral Small 3.1 FP8 test (#7008 ) The error was fixed by #6909. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-18 19:50:03 -04:00
Yiqing Yan	28c30e1bf8	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 22:04:43 +08:00
Emma Qiao	2992e9cd58	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:31:50 +08:00
Ivy Zhang	055fdd9e31	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-18 13:50:46 +08:00
Guoming Zhang	96bda14fbd	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-17 23:22:01 -04:00
Yan Chunwei	6d65b63b8d	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-15 05:33:25 -04:00
Yan Chunwei	54ffc6a250	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-15 11:08:38 +08:00
2ez4bz	ccb62ef97e	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-13 21:25:55 -04:00
brb-nv	3d95742d97	[https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests (#6870 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-13 20:05:35 -04:00
Guoming Zhang	3e46624f09	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-13 10:09:35 -04:00
Ivy Zhang	fd8f417bf2	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-13 02:21:05 -04:00
xinhe-nv	0958efdcff	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-13 13:13:36 +08:00
Ivy Zhang	15bcf80596	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-13 13:10:13 +08:00
brb-nv	3d169bfdad	[https://nvbugs/5445774 ][fix] Unwaive Gemma3 27B fp8 test (#6799 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-12 08:54:15 -07:00
Yanchao Lu	c39454c617	[None][infra] Avoid intermittent access broken to nvcr.io (#6715 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-12 11:48:59 +08:00
Raayan Dhar	ddf8e8d1a0	[None][feat] adding support for disaggregated multi-instance tests (#6674 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-08-11 13:00:57 -07:00
2ez4bz	efd0a51508	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) (#6765 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-11 10:13:10 -07:00
Yechan Kim	e6642eb68c	[https://nvbugs/5444095 ][infra] waive test_ptp_quickstart_multimodal llava test (#6795 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-11 11:58:37 -04:00
Emma Qiao	824feb8653	[None][infra] Waive failed tests on release branch (#6782 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 03:14:47 -04:00
Bo Deng	a4f9e637ae	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6737 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-11 13:29:11 +08:00
Yan Chunwei	21e4f51139	[TRTLLM-4721][test] Add qa test for llm-api (#6727 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-11 08:03:16 +08:00
ruodil	28b762a2a2	[None][test] fix yml condition error under qa folder (#6733 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-08 15:59:09 +10:00
Bo Deng	d289d85bff	[TRTLLM-6675][infra] Nixl test completion (#6623 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-08 10:15:54 +08:00
Ivy Zhang	232a39de1f	[TRTLLM-5574][test] Add NIM required VLM models multi-gpu test (#6687 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-08 11:58:58 +10:00
brb-nv	4adde41632	[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-07 13:14:04 -04:00
ruodil	0f8242aed9	[None][test] cherry-pick: correct test-db context for perf yaml file and add mistral cases (#6688 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 06:16:42 -04:00
Stanley Sun	53f94a4a0e	[None][test] Add Mistral Small 3.1 24B accuracy test to QA test list (#6682 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-08-07 03:24:35 -04:00
YueWeng	157ea77549	[https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-08-07 10:25:17 +08:00
ruodil	780d7507f9	[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:02:13 +10:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
Iman Tabrizian	13ecb4aced	[https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests (#6644 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-06 09:08:29 -04:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Emma Qiao	78a75c2990	[None][Infra] - Split gb200 stages for each test (#6594 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-05 07:10:00 -04:00
xinhe-nv	c32584125e	[TRTQA-2920][fix] Add failed cases into waives.txt (#6600 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00

1 2 3 4 5 ...

599 Commits