TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 19:52:38 +08:00

Author	SHA1	Message	Date
bhsueh_NV	ec4190fb71	infra: Add qwen3 235B tests into QA (#4483 ) * add qwen3 qa test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * add qwen3 test into qa list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-20 17:37:09 +08:00
ruodil	b5edf13b33	test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282 ) * add cases for rtx_pro_6000 and update test filter Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-20 10:58:05 +08:00
Michal Guzek	0a342a42f7	[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362 ) * Add CLI TestLlama3_3_70BInstruct acc tests Signed-off-by: moraxu <mguzek@nvidia.com> * Add tests to qa lists Signed-off-by: moraxu <mguzek@nvidia.com> * Add comment Signed-off-by: moraxu <mguzek@nvidia.com> * Fix test names Signed-off-by: moraxu <mguzek@nvidia.com> * Update yaml files Signed-off-by: moraxu <mguzek@nvidia.com> * Update cli file Signed-off-by: moraxu <mguzek@nvidia.com> --------- Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-20 09:48:14 +08:00
Venky	bb02d86b54	test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench) (#4128 ) * changes to run llama-v3.3-nemotron-super-49b Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * yapf Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * address review comments pt 1 Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * re-add cpp super tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-19 12:00:48 -07:00
Faraz	7656af1b57	[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335 ) * add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * update cutlass versions Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added internal cutlass with fix and docker update Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * added mixtral to pro 6000 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 08:56:21 -07:00
liji-nv	58e405624a	[https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 (#3952 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-05-19 22:12:25 +08:00
Iman Tabrizian	c6074c47da	Add llama4 disagg accuracy tests (#4336 ) * Add llama4 disagg accuracy tests Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Make it async and add GSM8K benchmark Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-19 21:55:08 +08:00
Shi Xiaowei	001704cc6a	fix: temp disable the problem test (#4445 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-05-19 21:54:32 +08:00
Dom Brown	c45f414bbf	Test: Improve model re-use in C++ DGX tests for CI stability (#4263 ) * Fix padded vocab size for Llama Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Refactor multi GPU llama executor tests, and reuse the built model engines Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Fix test list typo Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Further WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update test lists and readme Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Try parametrize for asymmetric Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Parametrize + skip unsupported combinations Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> * Update test list Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> * Reduce environment duplicated code Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>	2025-05-19 14:20:21 +01:00
Shi Xiaowei	df2798e0c3	feat: NIXL interface integration (#3934 ) NIXL interfaces Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-05-19 18:18:22 +08:00
Kaiyu Xie	a43914619f	fix: wrong argument name `enable_overlap_scheduler` (#4433 ) Fix wrong argument Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-19 15:02:22 +08:00
Yan Chunwei	5b1c88de8d	chore: cleanup perf_evaluator code (#3833 ) * chore: cleanup perf_evaluator code Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * up Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-19 13:21:36 +08:00
Ivy Zhang	58d2508b89	tests: Add test cases for rcca cases (#4347 ) * add qwen2_0_5_instruct cp4 test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen2.5 fp8 kvcache test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add ds distill qwen cpp runner test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 12:06:43 +08:00
Ivy Zhang	c4a0d768b5	tests: add qa test mentioned in docs (#4357 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix import error Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * nemotronh fp8 trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove nemotronh-fp8 Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-19 10:06:51 +08:00
Faraz	791c209006	[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) (#4363 ) * added nemotron 49b fp8 for B40 release Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * add tests to QA list Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * pre-commit changes Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-19 09:30:24 +08:00
Iman Tabrizian	7de90a66bc	Remove vila test (#4376 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-19 09:02:39 +08:00
hlu1	befb93cbff	[Deepseek] Add accuracy test references for fp8 kvcache (#4374 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-17 11:23:00 +08:00
Emma Qiao	27bdd0c82d	[TRTLLM-4886][infra]Try another timeout opt to exit test thread directly instead of gracefully (#4341 ) * Try another timeout opt to kill test thread Signed-off-by: qqiao <qqiao@nvidia.com> * Return true when try to delete non-existing result file Signed-off-by: qqiao <qqiao@nvidia.com> * quick test for the result file Signed-off-by: qqiao <qqiao@nvidia.com> * Change back the global timeout setting Signed-off-by: qqiao <qqiao@nvidia.com> * Try to kill test in internal pytest Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-16 17:56:40 +08:00
Daniel Cámpora	df19430629	chore: Mass Integration 0.19 (#4255 ) * fix: Fix/fused moe 0.19 (#3799) * fix bug of stream init Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix: Add pre-download of checkpoint before benchmark. (#3772) * Add pre-download of checkpoint before benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Add missing remote code flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move from_pretrained to throughput benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move download and use snapshot_download. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Removed trusted flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Fix benchmark command in iteration log test. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> --------- Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fuse Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * TRTLLM-4875 feat: Add version switcher to doc (#3871) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * waive a test (#3897) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> * fix: remote mpi session abort (#3884) * fix remote mpi session Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * skip fp8 gemm for pre-hopper (#3931) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update multigpu list Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix namings Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * Doc: Fix H200 DeepSeek R1 perf doc (#4006) * fix doc Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * update perf number Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * Fix the perf regression caused by insufficient cache warmup. (#4042) Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * doc: Update 0.19.0 release notes (#3976) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * Optimize the AutoTuner cache access code to reduce host code overhead. (#4060) The NVFP4 Linear op is very sensitive to the host overhead. This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Update switcher (#4098) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * doc: update release notes (#4108) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * docs:update 0.19 doc. (#4120) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * docs:add torch flow supported model list. (#4129) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * doc: Release V0.19 Perf Overview Update (#4166) Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> * Fix readme of autodeploy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update tensorrt_llm/_torch/pyexecutor/llm_request.py Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert mgmn worker node. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Change to disable_overlap_scheduler. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>	2025-05-16 10:53:25 +02:00
HuiGao-NV	d5578b37fc	Change the method to calculate kv memory size in tests (#4332 ) * Change the method to calculate kv memory size in tests * Set larger peak memory size to llama case Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-16 15:35:40 +08:00
QI JUN	c4cd403af9	[CI] waive test_chunked_prefill test cases (#4380 ) waive test_chunked_prefill Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-16 10:27:20 +08:00
Iman Tabrizian	4c7191af67	Move Triton backend to TRT-LLM main (#3549 ) * Move TRT-LLM backend repo to TRT-LLM repo Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Address review comments Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * debug ci Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Update triton backend Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Fixes after update Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-16 07:15:23 +08:00
yuxianq	4f8afe4cc6	feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-16 04:16:53 +08:00
Venky	adb0839a33	test(perf): Add `Phi-4-mini-instruct` to perf tests (#4267 ) * add phi-4-mini-instruct Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * trim tests Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-15 21:27:03 +08:00
yuxianq	0e87fcc228	refactor: use x is None instead of x == None. (#4244 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-15 20:00:04 +08:00
Yanchao Lu	5ce1102a02	Revert "[test] add qa test mentioned in docs" (#4355 ) Revert "[test] add qa test mentioned in docs (#4248)" This reverts commit `b0ce1371ee`.	2025-05-15 18:47:30 +08:00
zhhuang-nv	d6b741ddfe	[fix] test_no_kv_cache_reuse for overlap_scheduler (#4350 ) fix test_no_kv_cache_reuse for overlap_scheduler Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-05-15 16:43:53 +08:00
xinhe-nv	14bfb5e0d6	test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283 ) * update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-15 15:57:44 +08:00
zhhuang-nv	97bc680cd8	feat: support kv cache reuse for MLA (#3571 ) * support kv cache reuse for MLA load compressed_kv and k_pe and do up-projection use 192/128 head size MLA context kernel support Blackwell and Hopper now Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * add CI test Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2 Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * use GPTJ style RoPE for MLA Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix rebase error and some docs Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix kv_lens Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * tiny fix Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: use normal device memory instead of pinned memory for unit test Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * fix L0 tests Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile after rebase Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments again Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> --------- Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com> Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-05-15 15:22:21 +08:00
Kaiyu Xie	b4e5df0ee0	Breaking change: perf: Enable scheduling overlap by default (#4174 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-15 14:27:36 +08:00
dominicshanshan	404fbe9b32	[https://nvbugs/5277113 ][fix]genai-perf API change stress test (#4300 ) * fix bug 5277113. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * fix bug 5277113 and 5278517. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-15 14:12:34 +08:00
Ivy Zhang	b0ce1371ee	[test] add qa test mentioned in docs (#4248 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-15 13:37:11 +08:00
hlu1	3ea42e7519	[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346 ) Reorganize TestDeepSeekR1::test_nvfp4_8gpus Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-15 13:09:13 +08:00
Mike Iovine	f9adac3dea	[feat] Enable chunked context for flashinfer (#4132 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-15 10:59:38 +08:00
Robin Kobus	d31fefde2c	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 ) * chore: Remove GptSession/V1 from TRT workflow Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove stateful decoders Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession buffers Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession utils Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession kernels Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove V1 GPT models from tests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSessionBenchmark from scripts and docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSession IO classes Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from test lists Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove useless encoder test Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove mActualBatchSize from DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove static batching from ExecutorTest - Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter. - Adjusted related test functions to reflect the changes in parameter lists. - Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-14 23:10:04 +02:00
Faraz	42de79d49e	test: Added tests for Llama3.1-70B-BF16 on SM120 (#4198 ) * Added tests for Llama3.1-70B-BF16 on SM120 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * solve conflicts add more tests Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-14 11:57:49 -04:00
Kaiyu Xie	6c45586c51	chore: Remove deprecated Python runtime benchmark (#4171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-14 18:41:05 +08:00
HuiGao-NV	f4059c6e2e	Add test case for kv memory estimation (#4158 ) * Add test case for kv memory estimation * Dump running log into file and parse kv cache memory size from file * Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case * Revert change to usage of fraction * use context manager to guard temp files Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-14 18:39:25 +08:00
DylanChen-NV	206f82115d	[bug/5247505] fix: CP accuracy on Blackwell (#4188 ) * fix xqa params for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * try adding B200 multi gpu test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add accuracy tests for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> --------- Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-05-14 17:40:50 +08:00
Anurag Mukkara	b15f57763d	tests: PyTorch multimodal using keyword match (#4215 ) * keyword accuracy check for pytorch multimodal Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Change keywords for some prompts Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Delete full text answers Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Cleanup debug code Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> --------- Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-05-14 17:18:43 +08:00
bhsueh_NV	1a9298bc66	CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266 ) add fp8/fp4 ci on Qwen3-30B-A3B Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-14 14:38:04 +08:00
brb-nv	8280c3d4f2	feat: Support Gemma3-1b-it in Pytorch workflow (#3999 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 14:02:44 +08:00
Yi Zhang	86ae506b9d	[fix] Enable pp tests (#3978 ) Fix misrebase issue Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-05-14 10:51:20 +08:00
brb-nv	1ef117688c	test: Validate FP8 and LoRA for Gemma3 (#3670 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-13 17:28:02 -07:00
brb-nv	cd5b3d21a0	feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 03:47:22 +08:00
ruodil	d555fe2530	test: fix for perf test script issue (#4230 ) fix for perf test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-13 10:29:20 +08:00
xinhe-nv	0cebc16139	test: [CI] Add failed cases into waives.txt (#4205 ) waive tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-13 10:22:42 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
Zheng Duan	c9e2a963e0	feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-12 07:23:57 -04:00

1 2 3 4

197 Commits