TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Stanley Sun	a23cdc4c1b	test: fix potential teardown error (#4908 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-05 10:39:57 +08:00
Daniel Cámpora	64d5eba9c7	Fix: max_num_sequences calculation with overlap scheduling into release/0.20 (#4889 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-04 22:33:12 +08:00
Yuxian Qiu	3af8159133	fix: [nvbugs/5312750] Keep embed_tokens for last pp rank if tie_word_embeddings. (#4902 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-04 19:49:08 +08:00
Stanley Sun	33cd27f114	test: fix rss increasement test case issue (#4868 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-04 10:35:06 +08:00
Yiqing Yan	b1ce7f0765	Waive L0 test (#4862 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-03 18:37:21 +08:00
Yiqing Yan	95e6ad579d	Waive L0 test (#4857 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-03 15:58:26 +08:00
Fanrong Li	6e46e13523	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379 (#4833 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 12:30:01 +08:00
Fanrong Li	82d918b93e	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536 (#4834 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 12:29:54 +08:00
Yanchao Lu	36116f09f6	[Infra] - Better utilize multi-GPU CI resources (#4850 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-03 12:25:20 +08:00
ruodil	7c47714a39	test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test (#4796 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 10:20:55 +08:00
Stanley Sun	b58556e2d9	test: remove invalid triton integration test cases (#4801 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 09:39:23 +08:00
Michal Guzek	4e68be2da7	[TRTLLM-4932] Remove moe- related arguments from Llama-3_1-Nemotron-Ultra-253B-v1 CLI accuracy test (#4808 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-02 12:16:28 -07:00
pcastonguay	ddd704f39c	fix: Fix queued req stats for release/0.20 (#4806 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-06-02 08:32:24 -04:00
brb-nv	7a2cd255bc	fix: Skip dummy medusa/eagle tests when WORLD_SIZE env variable is missing (#4786 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-02 02:21:24 -07:00
Yan Chunwei	55170ec83a	fix: llmapi-launch add add trtllm-bench test with engine building (#4… (#4550 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-01 08:38:01 +08:00
Iman Tabrizian	00e0837e5c	Remove disaggregated cuda graph waived test (#4707 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-31 07:24:00 +08:00
Yiqing Yan	830d68d101	Waive l0 tests (#4795 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-30 15:56:58 +08:00
Ivy Zhang	9980e73afa	tests: waive failed case (#4785 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-30 11:24:25 +08:00
xinhe-nv	1bc3dfa490	tests: fix 5250460 (#4751 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-30 10:13:45 +08:00
Iman Tabrizian	de0613bd83	[nvbugs/5297821] Fix llama4 disaggregated serving accuracy tests (#4743 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-29 12:55:17 -07:00
Pamela Peng	52465216f4	[https://nvbugs/5295389 ][fix]fix moe fp4 on sm120 (#4624 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>	2025-05-29 09:50:47 -07:00
Stanley Sun	040fef709a	test: remove large bs as it will oom (#4726 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-29 14:31:57 +08:00
ruodil	5c235de80d	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4720 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 12:47:52 +08:00
nv-guomingz	bc7e53c9ef	fix:https://nvbugs/5214239 (#4718 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-29 09:36:31 +08:00
Iman Tabrizian	f57cd1b1a9	Remove V1 batching tests (#4703 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-29 05:57:57 +08:00
Bo Li	6567453d3e	fix: [https://nvbugspro.nvidia.com/bug/5286795 ] Unwaive tests for bug-5286795. (#4724 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-05-29 00:51:23 +08:00
Venky	1a989a8189	[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) (#4499 ) (#4588 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-28 15:48:01 +08:00
Venky	b4e598da27	[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446 ) (#4590 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-28 14:17:24 +08:00
Venky	42e622a3b9	[cherry-pick] test(perf): Add remaining `Phi-4-mini-instruct` perf tests (#4443 ) (#4589 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-28 14:17:18 +08:00
brb-nv	fc3c2f7f7c	fix: Mistral Small vision encoder with BS>1 (#4713 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-28 12:49:28 +08:00
HuiGao-NV	1bfc7d4c29	fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information (#4660 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-28 10:00:19 +08:00
Yuxian Qiu	87b50a5736	fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. (#4699 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-28 08:13:06 +08:00
Ivy Zhang	fbe48df361	tests: waive and unwaive QA test cases (#4644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-27 15:19:45 +08:00
Yan Chunwei	10119412ef	fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4529 ) fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428)	2025-05-27 15:19:04 +08:00
Michal Guzek	24153c068e	[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models (#4242 ) * Add tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests v2 Signed-off-by: moraxu <mguzek@nvidia.com> * Add fixes Signed-off-by: moraxu <mguzek@nvidia.com> * Skip fp8 test for Ultra Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - fix Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - comment out acc refs Signed-off-by: moraxu <mguzek@nvidia.com> * Add more test granularity Signed-off-by: moraxu <mguzek@nvidia.com> * Fix examples_test_list.txt Signed-off-by: moraxu <mguzek@nvidia.com> * Update test list file Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> * Remove MMLU tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add remaining models Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-24 19:17:21 +08:00
Jinyang Yuan	f9a9a1af2e	[fix] Fix Llama4 allgather error due to None tensor (#4511 ) * [fix] Fix Llama4 allgather error due to None tensor Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Refactor modifications Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Minor modification Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Minor fix Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> --------- Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-05-24 19:12:12 +08:00
Iman Tabrizian	ad4d947b24	Add missing rcca folder (#4591 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-24 03:28:10 +08:00
Michal Guzek	2a2d7ebf2e	[fix] Incorrect mocker argument for a CLI accuracy test in Llama-3.3-70B-Instruct (#4604 ) Fix mocker argument Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-23 12:18:37 -07:00
Michal Guzek	d2e6af2fe4	[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant (#4375 ) * Add CLI TestNemotronSuper acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Update mmlu.yaml Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Skip FP8 test in CLI Signed-off-by: moraxu <mguzek@nvidia.com> * Address reviews Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-23 12:17:23 -07:00
Faraz	53008d3ee8	[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) (#4548 ) added nvfp4 nemotron for qa testing on RTX 6000 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-23 10:42:32 -07:00
Simeng Liu	630b7907a0	[CI] Waive known errors with test TestDeepSeekV3Lite::test_fp8_block_scales_4gpus (#4627 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-05-23 10:33:44 -07:00
stnie	21af6f77dc	ci: waive testcase [NVBUG 5297821] (#4616 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-05-23 20:54:42 +08:00
Barry Kang	26793e3569	[https://nvbugs/5289907 ][fix] Restore per-channel pre-quant (#4545 ) * Restore per-channel pre-quant Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> * Update TRT test script Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> * Fix pre-commit Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> --------- Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-05-23 19:46:53 +08:00
Yukun He	d7701ea6d8	[5180961] chore: Unwaive test for Qwen model. (#4524 ) * Unwaive test for Qwen model. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * update. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> --------- Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-23 13:28:08 +08:00
ruodil	2ce14357ff	test: fix for perf sanity test and skip fp8 deepseek blackwell cases (#4598 ) fix for sanity test and skip fp8 deepseek blackwell cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-23 11:13:14 +08:00
Venky	d15ceae62e	test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 ) * extend pyt nano tests perf coverage Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * explicitly set maxnt for some cases This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-23 08:44:37 +08:00
Yukun He	dd79631b77	[5234029][5226211] chore: Unwaive multimodal tests for Qwen model. (#4519 ) Unwaive multimodal tests for Qwen models. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-23 08:04:56 +08:00
ruodil	3d083b69be	test: waive hanging cases for perf test (#4563 ) waive hanging cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-22 21:09:12 +08:00
Yukun He	21ada0a961	[5141290][5273694][5260696] fix: Fix mrope argument missing issue in the summary tasks for Qwen model. (#4432 ) Fixed the mrope argument missing issue in the summary tasks for Qwen models. And re-enabled the fixed tests. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-22 17:45:59 +08:00
ruodil	ce6a32997b	test: add failed case in waive list and fix some test script issue for perf test (#4528 ) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-21 16:36:32 +08:00

1 2 3 4 5 ...

547 Commits