TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 12:12:39 +08:00

Author	SHA1	Message	Date
ruodil	d555fe2530	test: fix for perf test script issue (#4230 ) fix for perf test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-13 10:29:20 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
ruodil	9c03a7ab74	test: add llama_3.2_1B model and fix for test lora script issue (#4139 ) * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add llama_3.2_1B model and fix for lora script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-12 14:51:59 +08:00
xinhe-nv	849d9c343c	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 ) remove unsupported tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 14:13:40 +08:00
ruodil	bf5b2a2e0a	test: amend regex match for perf throughput (#4186 ) amend regex match for perf throughput Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 17:33:25 +08:00
ruodil	5ce5b81281	test: amend default pytorch extra-llm-api-config.yml in perf test (#4176 ) * amend default pytorch extra-llm-api-config.yml Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add print info to separate cases in output log Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 16:46:48 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
Stanley Sun	fb31f91e15	test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-09 11:03:02 +08:00
Ivy Zhang	7666bec7c4	[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053 ) * add MMLU, GPQADiamond check for llama-4 models Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add nomotron cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add online quant test cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove trt flow cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust parallelism strategy Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix fail Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update sanity list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix comment Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * skip nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:10:41 +08:00
ruodil	4d0e462723	tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864 ) * tests: skip writing prepare_dataset output to logs Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-07 13:56:35 +08:00
Venky	62fea1e885	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>	2025-05-06 17:17:55 -07:00
Yan Chunwei	bc0cf41592	chore: refactor llmapi e2e tests (#3803 ) * refactor llmapi e2e tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-05 07:37:24 +08:00
Emma Qiao	2692daad2e	infra: Remove the WAR for test items incompletely (#3313 ) * Remove the WAR for test items incompleted Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test item manually Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test definition file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix some other test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update name for waived case name, too Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix name for multi-gpu tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix other qa tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix tests name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Fix name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Correct test names in waive.txt Signed-off-by: qqiao <qqiao@nvidia.com> * Add new test_durations file Signed-off-by: qqiao <qqiao@nvidia.com> * Fix names after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Update test duration to latest Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-04 11:31:59 +08:00
YueWeng	b1621e8d4e	feat: add relaxed acceptance for DS (#3865 ) * add relaxed acceptance for DS R1 Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * clean and update docs Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * Modified based on review Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix mtp manager issue Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> --------- Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-01 21:50:36 +08:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
ruodil	9223000765	waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-22 14:51:45 +08:00
Enwei Zhu	3fa19ffa4e	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 ) * add gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add gpqa Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * conditional import lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * gpqa in lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * system prompt Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * shuffle Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * revert AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * integration to tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add DS-R1 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix and clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * clean up Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * free_gpu_memory_fraction=0.8 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-22 07:38:16 +08:00
Barry Kang	d87b009d8d	Fix ModelOpt Mixtral AWQ OOM (#3714 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-04-21 19:14:14 +08:00
Stanley Sun	852dd0c1be	test: add llama3.2 ptp test case (#3363 ) * add llama3.2 ptp test case Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * update test list Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-04-21 15:15:45 +08:00
Emma Qiao	48db263d9a	infra: Add test list name check (#3097 ) * Add steps to check test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct test-db command Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Switch to use a trt-llm image Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update go path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct go path Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move the test list check to test ci Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct file path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix path again Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix get path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip test list check for ARM Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix expression Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change back unrelated file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct qa test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove a stage Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move some steps to a python script Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix script path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split commands and debug Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Also correct case name in waives list Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move check script to another folder Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update qa list after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove the perf tests under QA Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Some tests already fixed after rebase to TOT Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-20 23:02:16 +08:00
brb-nv	c35d2a7532	test: Get Eagle tests working (#3593 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-20 00:50:57 +08:00
Ivy Zhang	ad19ca3cbf	remove benchmark test list (#3644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 16:23:41 +08:00
Netanel Haber	3c52ac098f	feat: allocate minimal blocks per window size (#3028 ) * implement variable window attention by breaking the block manager into window block managers per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * revert isCyclic to be true if the min attention window is reached, not per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add explanatory comment to mCyclicThreshold Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * load correct gemma config Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * if TYPE_CHECKING Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * pass dtype as well Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * test_gemma variable sliding window attention Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove \|\| mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * turn off request delaying for MaxUtil Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * make comments better Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * windowSizesTotalSum using std::accumulate Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comments Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove assert that kills disagg tests, since it isn't necessary Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add Gemma3 to SUPPORTED_HF_ARCHITECTURES Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * support Gemma3 Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix kvfactor field for deepseek Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix gemma-3 entries in testlist to include vswa Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * only quantize gemma2 VSWA Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> remove misleading comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix: disable KV cache reuse if using attention sink (#3021) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-17 16:04:57 +08:00
Ivy Zhang	b2fb0fe843	test: add quickstart test for nemotron-ultra (#3596 ) * add quickstart test for nemotron-ultra Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix test name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 11:16:41 +08:00
ruodil	5e2ebebe76	tests: change qa perf test to trtllm-bench (#3189 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 09:53:32 +08:00
Daniel Cámpora	41ce5440fe	chore: Mass integration of release/0.18 (#3421 ) * [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <ruodil@nvidia.com> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <qqiao@nvidia.com> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Formatting. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove duplicated file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove indent change. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update dev images Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * Remove duplicated import. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix custom op Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Skip problematic case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-16 10:03:29 +08:00
xinhe-nv	5cfa927132	update waive list (#3503 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 16:53:53 +08:00
Ivy Zhang	170bc22139	fix test name (#3534 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-14 17:09:50 +08:00
brb-nv	44090a5388	Add support for Phi-4-MM (#3296 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-14 14:24:10 +08:00
Ivy Zhang	d998832b33	test: add torch flow test case in qa test list (#3404 ) Signed-off-by: Ivy Zhang <yanzh@nvidia.com>	2025-04-11 16:57:41 +08:00
brb-nv	c59abae436	feat: Add Gemma3 text-only model support (#3247 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-10 12:34:58 +08:00
Yechan Kim	943218b54a	feat: Add Qwen2.5-VL and refactor Qwen2-VL (#3156 ) * feat: Add Qwen2.5-VL and refactor Qwen2-VL Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix yapf and codespell Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test_e2e Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * generalize get_rope_index Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix qwen2.5-vl on REAME Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix image test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-10 04:09:03 +08:00
Enwei Zhu	ba019a43d6	test: Accuracy test improvement (Part 3.3): Move DeepSeek tests (#3260 ) add skip fix fix update update test list fixqa list move bf16 to postmerge Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-08 07:19:04 +08:00
YueWeng	aab6214801	test: fix conflicting test names (#3316 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-04-07 20:10:01 +08:00
qixiang-99	0d4d50a745	feat: no-cache attention in PyTorch workflow (#3085 ) * init trtllm attn no cache Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: fix the seq_len issue and attn metadata prepare for qwen reward model test fix: fix minor bugs after rebase Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove unnecessary debug logs and clean up commented code refactor: update max_seq_len documentation and remove max_seq_len for decoder model contructor in PyTorchModelEngine Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: update calculate_ref_result function to accept tensor inputs and mask type, enhance test_attention_no_cache to support FULL and CAUSAL masks Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove unused BERT attention metadata conversion method and add type assertion for no cache attention in PyTorchModelEngine Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove use_kv_cache parameter from attention function and related classes, update documentation for KV cache handling Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: implement setAttentionMaskType method for better mask type handling and remove unused conversion function Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: streamline KV cache handling by replacing direct member access with useKVCache method and simplify token per block assignment remove Debug code. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: Resolve comments for Python code Simplify no cache attention metadata preparation and streamline related attributes in TrtllmAttentionMetadata Removed the private method for converting to no cache attention metadata and integrated its logic into the prepare method. Updated the test for BERT sequence classification to reflect these changes and ensure proper handling of attention metadata. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * docs: Add is_dummy_attention field to attention metadata for simulation operations Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: add KVCacheParams to attention backend interface and import relevant metadata classes Updated the attention backend interface to include KVCacheParams and imported TrtllmAttentionMetadata and VanillaAttentionMetadata in model_engine.py for enhanced functionality. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: fix rebase format issue Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: extend attention mask type handling in MHARunnerFixedParams Added support for additional attention mask types (BIDIRECTIONAL, BIDIRECTIONALGLM, BLOCKSPARSE) in the MHARunnerFixedParams structure to fix the mapping issue between ContextAttentionMaskType and AttentionMaskType Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: enhance attention mask type handling in TllmGenFmhaRunnerParams Updated the setAttentionMaskType method to include a switch-case structure for better handling of attention mask types, ensuring proper mapping and error handling for invalid types. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> --------- Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>	2025-04-05 01:54:32 +08:00
Tracin	bb6c338730	AWQ support Modelopt ckpts. (#3258 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-04 08:10:35 +08:00
xinhe-nv	2005e5aaaf	remove tests from qa test lists (#3256 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-03 16:06:39 +08:00
Enwei Zhu	3cf7066350	test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219 ) * remove test_llm_models_multi_gpu.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * qwen 2.5 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * upgrade Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:29:57 +08:00
Chuang Zhu	bc5811da65	chore: Ucx ip port remove mpi depend (#3101 ) * initial ucx support Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixes to support dynloading and ucx connection establishment - not stable yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * more connection bringup fixes - faillig on connection vector build Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * executor test pass Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * passed full benchmark Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * changing to TLLM_THROW and removing cout Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * stoping progress thread at ucxComm destructor Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fix copyrights Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * adding ucx flavor to cache transceiver test and insertto the CI pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * allowing sending non ib interfaces IPs Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * setting UCX port reuse for the tests in pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * code review fixes Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep after GID message is sent to avoid UCX Errors Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing more CR issues Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep to not fail is ep_not_connected yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * remove mpi dependency and debug Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * debug to info Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * mpirun n 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * remove mpi comm split when disaggOrchestrator mode Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * waive disagg_mtp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future instead of thread Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future_promise instead of cv wait Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * connectionId type Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * improve test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * imporve test 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * gtest_skip Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>	2025-04-02 09:42:29 +08:00
brb-nv	1fe3e30356	Add support for Phi-4-mini (#2990 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-02 08:34:39 +08:00
Enwei Zhu	b2f69db507	test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of `trtllm-eval` (#3167 ) * add eval_llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> tmp commit port to CLI tool Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> setup llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix spec_dec_algo Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> _update_from_hf_quant_config Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> migrate test_pytorch.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 block scales Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 rowwise Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> adj alpha Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move test_pytorch.py cases Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> rename test_accuracy.py to test_cli.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix cnn_dailymail Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * renaming to cli flow Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename MMLU Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add error Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-01 22:20:29 +08:00
brb-nv	727d78e785	Support prequantized fp8 ckpt for nemotron-mini-4b-instruct (#3046 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-01 14:52:09 +08:00
brb-nv	1901bfcf76	test: Add Eagle tests with untrained heads (#2991 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-01 11:41:59 +08:00
bhsueh_NV	322ac565fc	chore: clean some ci of qa test (#3083 ) * move some models to examples/models/contrib Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * update the document Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove arctic, blip2, cogvlm, dbrx from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove tests of dit, mmdit and stdit from qa test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove grok, jais, sdxl, skywork, smaug from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * re-organize the glm examples Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix issues after running pre-commit Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix some typo in glm_4_9b readme Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-31 14:30:41 +08:00
Fanrong Li	644a01cbbe	test: Add gpqa tests for DeepSeek models (#3063 ) * Add gpqa accuracy test script * Add gpqa accuracy tests * Update DeepSeek-v3 doc * Update qa test list --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-03-27 19:47:06 +08:00
Yan Chunwei	82edd90350	fix gpus_per_node in trtllm-bench when world_size < device_count (#3007 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-03-27 09:31:40 +08:00
Yechan Kim	3c7cb6629c	Add EXAONE-Deep (#3054 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-03-26 14:24:04 +08:00
kxdc	e6cb34d921	test: fix QA TRT integration testlist mismatch issue (#3090 ) Incorrect test-db context caused empty test list output. Fix by typo correction: `llm_trt_` -> `trt_llm_`. Signed-off-by: kxdc <xink@nvidia.com>	2025-03-26 14:03:21 +08:00

1 2

54 Commits