TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 12:12:39 +08:00

Author	SHA1	Message	Date
Stanley Sun	040fef709a	test: remove large bs as it will oom (#4726 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-29 14:31:57 +08:00
ruodil	5c235de80d	test: remove perf test l40s/l20 oom test cases and unwaive tests (#4720 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-05-29 12:47:52 +08:00
Venky	1a989a8189	[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) (#4499 ) (#4588 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-28 15:48:01 +08:00
Venky	b4e598da27	[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446 ) (#4590 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-28 14:17:24 +08:00
Venky	42e622a3b9	[cherry-pick] test(perf): Add remaining `Phi-4-mini-instruct` perf tests (#4443 ) (#4589 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-28 14:17:18 +08:00
brb-nv	fc3c2f7f7c	fix: Mistral Small vision encoder with BS>1 (#4713 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-28 12:49:28 +08:00
Michal Guzek	24153c068e	[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models (#4242 ) * Add tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests v2 Signed-off-by: moraxu <mguzek@nvidia.com> * Add fixes Signed-off-by: moraxu <mguzek@nvidia.com> * Skip fp8 test for Ultra Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - fix Signed-off-by: moraxu <mguzek@nvidia.com> * Skip tests for Phi - comment out acc refs Signed-off-by: moraxu <mguzek@nvidia.com> * Add more test granularity Signed-off-by: moraxu <mguzek@nvidia.com> * Fix examples_test_list.txt Signed-off-by: moraxu <mguzek@nvidia.com> * Update test list file Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> * Remove MMLU tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add remaining models Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-24 19:17:21 +08:00
Michal Guzek	d2e6af2fe4	[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant (#4375 ) * Add CLI TestNemotronSuper acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Update mmlu.yaml Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Skip FP8 test in CLI Signed-off-by: moraxu <mguzek@nvidia.com> * Address reviews Signed-off-by: moraxu <mguzek@nvidia.com> * Address review comments Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-23 12:17:23 -07:00
Faraz	53008d3ee8	[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) (#4548 ) added nvfp4 nemotron for qa testing on RTX 6000 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-23 10:42:32 -07:00
ruodil	2ce14357ff	test: fix for perf sanity test and skip fp8 deepseek blackwell cases (#4598 ) fix for sanity test and skip fp8 deepseek blackwell cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-23 11:13:14 +08:00
Venky	d15ceae62e	test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 ) * extend pyt nano tests perf coverage Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * explicitly set maxnt for some cases This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-23 08:44:37 +08:00
ruodil	ce6a32997b	test: add failed case in waive list and fix some test script issue for perf test (#4528 ) add failed case in waive list and fix some test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-21 16:36:32 +08:00
Ivy Zhang	e977c75300	tests: update api change from decoder to sampler in test (#4479 ) update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-21 14:22:18 +08:00
ruodil	b5edf13b33	test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282 ) * add cases for rtx_pro_6000 and update test filter Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-20 10:58:05 +08:00
Michal Guzek	0a342a42f7	[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362 ) * Add CLI TestLlama3_3_70BInstruct acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests to qa lists Signed-off-by: moraxu <mguzek@nvidia.com> * Add comment Signed-off-by: moraxu <mguzek@nvidia.com> * Fix test names Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Update cli file Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-20 09:48:14 +08:00
Venky	bb02d86b54	test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) (#4128 ) * changes to run llama-v3.3-nemotron-super-49b Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * yapf Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * address review comments pt 1 Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * re-add cpp super tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-19 12:00:48 -07:00
Faraz	7656af1b57	[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335 ) * add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * update cutlass versions Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added internal cutlass with fix and docker update Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added mixtral to pro 6000 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 08:56:21 -07:00
liji-nv	58e405624a	[https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 (#3952 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-05-19 22:12:25 +08:00
Iman Tabrizian	c6074c47da	Add llama4 disagg accuracy tests (#4336 ) * Add llama4 disagg accuracy tests Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Make it async and add GSM8K benchmark Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-19 21:55:08 +08:00
Ivy Zhang	58d2508b89	tests: Add test cases for rcca cases (#4347 ) * add qwen2_0_5_instruct cp4 test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen2.5 fp8 kvcache test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add ds distill qwen cpp runner test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 12:06:43 +08:00
Ivy Zhang	c4a0d768b5	tests: add qa test mentioned in docs (#4357 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix import error Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * nemotronh fp8 trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove nemotronh-fp8 Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 10:06:51 +08:00
Faraz	791c209006	[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) (#4363 ) * added nemotron 49b fp8 for B40 release Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * add tests to QA list Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * pre-commit changes Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 09:30:24 +08:00
Venky	fb663b637a	Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) (#4195 ) * add ll-nm-nano tests that map to nim requirements Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * prune some pytorch cases (fp8) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * removing pyt backend test changes - When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging. - Therefore don't want to block this PR, hence removing them. - Seeing Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-17 22:46:21 +08:00
Stanley Sun	11aa50d1ea	test: add kv cache aware test cases to qa test list (#4257 ) add kv cache_aware test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-16 12:47:01 +08:00
Venky	adb0839a33	test(perf): Add `Phi-4-mini-instruct` to perf tests (#4267 ) * add phi-4-mini-instruct Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * trim tests Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-15 21:27:03 +08:00
Yanchao Lu	5ce1102a02	Revert "[test] add qa test mentioned in docs" (#4355 ) Revert "[test] add qa test mentioned in docs (#4248)" This reverts commit `b0ce1371ee`.	2025-05-15 18:47:30 +08:00
Stanley Sun	9d3e05486b	test: add qa test list for rtx5090 and rtx_pro_6000 (#4254 ) * add test list for rtx5090 and rtx_pro_6000 Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpu llama70b test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * remove duplicate and invalid test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpus test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-15 17:57:31 +08:00
xinhe-nv	14bfb5e0d6	test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283 ) * update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-15 15:57:44 +08:00
Ivy Zhang	b0ce1371ee	[test] add qa test mentioned in docs (#4248 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-15 13:37:11 +08:00
hlu1	3ea42e7519	[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346 ) Reorganize TestDeepSeekR1::test_nvfp4_8gpus Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-15 13:09:13 +08:00
Robin Kobus	d31fefde2c	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 ) * chore: Remove GptSession/V1 from TRT workflow Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove stateful decoders Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession buffers Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession utils Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession kernels Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove V1 GPT models from tests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSessionBenchmark from scripts and docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSession IO classes Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from test lists Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove useless encoder test Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove mActualBatchSize from DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove static batching from ExecutorTest - Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter. - Adjusted related test functions to reflect the changes in parameter lists. - Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-14 23:10:04 +02:00
Kaiyu Xie	6c45586c51	chore: Remove deprecated Python runtime benchmark (#4171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-14 18:41:05 +08:00
brb-nv	8280c3d4f2	feat: Support Gemma3-1b-it in Pytorch workflow (#3999 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 14:02:44 +08:00
brb-nv	1ef117688c	test: Validate FP8 and LoRA for Gemma3 (#3670 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-13 17:28:02 -07:00
brb-nv	cd5b3d21a0	feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 03:47:22 +08:00
ruodil	d555fe2530	test: fix for perf test script issue (#4230 ) fix for perf test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-13 10:29:20 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
ruodil	9c03a7ab74	test: add llama_3.2_1B model and fix for test lora script issue (#4139 ) * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add llama_3.2_1B model and fix for lora script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-12 14:51:59 +08:00
xinhe-nv	849d9c343c	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 ) remove unsupported tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 14:13:40 +08:00
ruodil	bf5b2a2e0a	test: amend regex match for perf throughput (#4186 ) amend regex match for perf throughput Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 17:33:25 +08:00
ruodil	5ce5b81281	test: amend default pytorch extra-llm-api-config.yml in perf test (#4176 ) * amend default pytorch extra-llm-api-config.yml Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add print info to separate cases in output log Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 16:46:48 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
Stanley Sun	fb31f91e15	test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-09 11:03:02 +08:00
Ivy Zhang	7666bec7c4	[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053 ) * add MMLU, GPQADiamond check for llama-4 models Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add nomotron cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add online quant test cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove trt flow cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust parallelism strategy Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix fail Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update sanity list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix comment Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * skip nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:10:41 +08:00
ruodil	4d0e462723	tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864 ) * tests: skip writing prepare_dataset output to logs Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-07 13:56:35 +08:00
Venky	62fea1e885	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>	2025-05-06 17:17:55 -07:00
Yan Chunwei	bc0cf41592	chore: refactor llmapi e2e tests (#3803 ) * refactor llmapi e2e tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-05 07:37:24 +08:00
Emma Qiao	2692daad2e	infra: Remove the WAR for test items incompletely (#3313 ) * Remove the WAR for test items incompleted Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test item manually Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test definition file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix some other test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update name for waived case name, too Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix name for multi-gpu tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix other qa tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix tests name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Fix name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Correct test names in waive.txt Signed-off-by: qqiao <qqiao@nvidia.com> * Add new test_durations file Signed-off-by: qqiao <qqiao@nvidia.com> * Fix names after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Update test duration to latest Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-04 11:31:59 +08:00

1 2

89 Commits