TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
ruodil	e05b3ff427	test: add deepseek_v3_lite rcca cases (#5225 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-16 13:39:26 +08:00
ruodil	3f284f1a3a	test: add deepseek rcca cases (#5195 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-15 16:20:15 +08:00
Fanrong Li	bfa3b59bb6	[https://nvbugs/5277592 ][fix] fix cuda graph padding for spec decoding (only for 0.20) (#5058 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-11 02:14:14 +08:00
Ivy Zhang	b626186241	tests: fix some typo and limitation on test cases (#4989 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-10 10:47:50 +08:00
Yechan Kim	9f5b23ae77	fix: [nvbugs/5324954, nvbugs/5304229] fix Qwen2-VL video and Qwen2.5-VL image test case (#4976 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-06-09 15:25:26 +08:00
Yukun He	0b4f7182fb	[5289904] chore: Unwaive test for Qwen model. (#4657 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-06-09 14:06:59 +08:00
Yukun He	5ee14657b4	[5310329] chore: Unwaive test_e2e.py::test_openai_reasoning. (#4981 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-06-09 14:05:21 +08:00
Stefan Niebler	a6e53bf4e0	ci: waive testcase [NVBUG 5247271] (#4992 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-06-08 16:47:06 +08:00
liji-nv	ff4212377c	[fix] Fix illegal mem access and possible accuracy lose (#4943 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-08 11:19:42 +08:00
Yukun He	fa20ffc5d4	[5310329] fix: Fix warmup phase batch size out of range. (#4912 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-06-06 12:26:05 +08:00
Zheng Duan	c4c7dd3517	fix: cache-aware router related test fix (#4911 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-05 13:07:24 +08:00
Stanley Sun	a23cdc4c1b	test: fix potential teardown error (#4908 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-05 10:39:57 +08:00
Daniel Cámpora	64d5eba9c7	Fix: max_num_sequences calculation with overlap scheduling into release/0.20 (#4889 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-04 22:33:12 +08:00
Yuxian Qiu	3af8159133	fix: [nvbugs/5312750] Keep embed_tokens for last pp rank if tie_word_embeddings. (#4902 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-04 19:49:08 +08:00
Stanley Sun	33cd27f114	test: fix rss increasement test case issue (#4868 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-06-04 10:35:06 +08:00
Yiqing Yan	b1ce7f0765	Waive L0 test (#4862 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-03 18:37:21 +08:00
Yiqing Yan	95e6ad579d	Waive L0 test (#4857 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-03 15:58:26 +08:00
Fanrong Li	6e46e13523	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379 (#4833 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 12:30:01 +08:00
Fanrong Li	82d918b93e	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536 (#4834 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-03 12:29:54 +08:00
Yanchao Lu	36116f09f6	[Infra] - Better utilize multi-GPU CI resources (#4850 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-03 12:25:20 +08:00
ruodil	7c47714a39	test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test (#4796 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 10:20:55 +08:00
Stanley Sun	b58556e2d9	test: remove invalid triton integration test cases (#4801 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-03 09:39:23 +08:00
Michal Guzek	4e68be2da7	[TRTLLM-4932] Remove moe- related arguments from Llama-3_1-Nemotron-Ultra-253B-v1 CLI accuracy test (#4808 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-02 12:16:28 -07:00
pcastonguay	ddd704f39c	fix: Fix queued req stats for release/0.20 (#4806 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-06-02 08:32:24 -04:00
brb-nv	7a2cd255bc	fix: Skip dummy medusa/eagle tests when WORLD_SIZE env variable is missing (#4786 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-02 02:21:24 -07:00
Yan Chunwei	55170ec83a	fix: llmapi-launch add add trtllm-bench test with engine building (#4… (#4550 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-01 08:38:01 +08:00
Iman Tabrizian	00e0837e5c	Remove disaggregated cuda graph waived test (#4707 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-31 07:24:00 +08:00
Yiqing Yan	830d68d101	Waive l0 tests (#4795 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-30 15:56:58 +08:00
Ivy Zhang	9980e73afa	tests: waive failed case (#4785 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-30 11:24:25 +08:00
xinhe-nv	1bc3dfa490	tests: fix 5250460 (#4751 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-30 10:13:45 +08:00
Iman Tabrizian	de0613bd83	[nvbugs/5297821] Fix llama4 disaggregated serving accuracy tests (#4743 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-29 12:55:17 -07:00
Pamela Peng	52465216f4	[https://nvbugs/5295389 ][fix]fix moe fp4 on sm120 (#4624 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>	2025-05-29 09:50:47 -07:00
Stanley Sun	040fef709a	test: remove large bs as it will oom (#4726 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-29 14:31:57 +08:00
ruodil	5c235de80d	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4720 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 12:47:52 +08:00
nv-guomingz	bc7e53c9ef	fix:https://nvbugs/5214239 (#4718 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-29 09:36:31 +08:00
Iman Tabrizian	f57cd1b1a9	Remove V1 batching tests (#4703 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-29 05:57:57 +08:00
Bo Li	6567453d3e	fix: [https://nvbugspro.nvidia.com/bug/5286795 ] Unwaive tests for bug-5286795. (#4724 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-05-29 00:51:23 +08:00
Venky	1a989a8189	[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) (#4499 ) (#4588 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-28 15:48:01 +08:00
Venky	b4e598da27	[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446 ) (#4590 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-28 14:17:24 +08:00
Venky	42e622a3b9	[cherry-pick] test(perf): Add remaining `Phi-4-mini-instruct` perf tests (#4443 ) (#4589 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-28 14:17:18 +08:00
brb-nv	fc3c2f7f7c	fix: Mistral Small vision encoder with BS>1 (#4713 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-28 12:49:28 +08:00
HuiGao-NV	1bfc7d4c29	fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information (#4660 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-28 10:00:19 +08:00
Yuxian Qiu	87b50a5736	fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. (#4699 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-28 08:13:06 +08:00
Ivy Zhang	fbe48df361	tests: waive and unwaive QA test cases (#4644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-27 15:19:45 +08:00
Yan Chunwei	10119412ef	fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4529 ) fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428)	2025-05-27 15:19:04 +08:00
Michal Guzek	24153c068e	[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models (#4242 ) * Add tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests v2 Signed-off-by: moraxu <mguzek@nvidia.com> * Add fixes Signed-off-by: moraxu <mguzek@nvidia.com> * Skip fp8 test for Ultra Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - fix Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - comment out acc refs Signed-off-by: moraxu <mguzek@nvidia.com> * Add more test granularity Signed-off-by: moraxu <mguzek@nvidia.com> * Fix examples_test_list.txt Signed-off-by: moraxu <mguzek@nvidia.com> * Update test list file Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> * Remove MMLU tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add remaining models Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-24 19:17:21 +08:00
Jinyang Yuan	f9a9a1af2e	[fix] Fix Llama4 allgather error due to None tensor (#4511 ) * [fix] Fix Llama4 allgather error due to None tensor Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Refactor modifications Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Minor modification Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> * Minor fix Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> --------- Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-05-24 19:12:12 +08:00
Iman Tabrizian	ad4d947b24	Add missing rcca folder (#4591 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-24 03:28:10 +08:00
Michal Guzek	2a2d7ebf2e	[fix] Incorrect mocker argument for a CLI accuracy test in Llama-3.3-70B-Instruct (#4604 ) Fix mocker argument Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-23 12:18:37 -07:00
Michal Guzek	d2e6af2fe4	[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant (#4375 ) * Add CLI TestNemotronSuper acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Update mmlu.yaml Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Skip FP8 test in CLI Signed-off-by: moraxu <mguzek@nvidia.com> * Address reviews Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-23 12:17:23 -07:00

1 2 3 4 5 ...

558 Commits