TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-10 04:53:38 +08:00

Author	SHA1	Message	Date
yuxianq	4f8afe4cc6	feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-16 04:16:53 +08:00
Venky	adb0839a33	test(perf): Add `Phi-4-mini-instruct` to perf tests (#4267 ) * add phi-4-mini-instruct Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> * trim tests Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-15 21:27:03 +08:00
yuxianq	0e87fcc228	refactor: use x is None instead of x == None. (#4244 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-15 20:00:04 +08:00
Yanchao Lu	5ce1102a02	Revert "[test] add qa test mentioned in docs" (#4355 ) Revert "[test] add qa test mentioned in docs (#4248)" This reverts commit `b0ce1371ee`.	2025-05-15 18:47:30 +08:00
Stanley Sun	9d3e05486b	test: add qa test list for rtx5090 and rtx_pro_6000 (#4254 ) * add test list for rtx5090 and rtx_pro_6000 Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpu llama70b test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * remove duplicate and invalid test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * add 2gpus test cases Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-15 17:57:31 +08:00
zhhuang-nv	d6b741ddfe	[fix] test_no_kv_cache_reuse for overlap_scheduler (#4350 ) fix test_no_kv_cache_reuse for overlap_scheduler Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-05-15 16:43:53 +08:00
xinhe-nv	14bfb5e0d6	test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283 ) * update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-15 15:57:44 +08:00
zhhuang-nv	97bc680cd8	feat: support kv cache reuse for MLA (#3571 ) * support kv cache reuse for MLA load compressed_kv and k_pe and do up-projection use 192/128 head size MLA context kernel support Blackwell and Hopper now Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * add CI test Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2 Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * use GPTJ style RoPE for MLA Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix rebase error and some docs Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix kv_lens Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * tiny fix Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix: use normal device memory instead of pinned memory for unit test Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> * fix L0 tests Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * fix torch compile after rebase Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments again Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> --------- Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com> Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com> Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-05-15 15:22:21 +08:00
Kaiyu Xie	b4e5df0ee0	Breaking change: perf: Enable scheduling overlap by default (#4174 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-15 14:27:36 +08:00
dominicshanshan	404fbe9b32	[https://nvbugs/5277113 ][fix]genai-perf API change stress test (#4300 ) * fix bug 5277113. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * fix bug 5277113 and 5278517. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-15 14:12:34 +08:00
Ivy Zhang	b0ce1371ee	[test] add qa test mentioned in docs (#4248 ) * add nemotron-h and llama_70b cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * trial Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add llm decoder quick_start case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add qwen3 quickstart test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add trtllm_decoder accuracy test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove quickstart test for llm_decoder Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-15 13:37:11 +08:00
hlu1	3ea42e7519	[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346 ) Reorganize TestDeepSeekR1::test_nvfp4_8gpus Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-15 13:09:13 +08:00
Zeyu WANG	2681b26e48	[TRTLLM-2795] feat: Add yarn support for other models in trt-flow (#3840 ) Add yarn support for general models(e.g. llama, qwen) other than deepseek in trt-flow. Signed-off-by: Zeyu Wang <zeyuw@nvidia.com>	2025-05-15 11:03:57 +08:00
Mike Iovine	f9adac3dea	[feat] Enable chunked context for flashinfer (#4132 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-15 10:59:38 +08:00
QI JUN	498ce8a056	Revert "feat: Low Precision Allreduce for PCIe based GPU" (#4340 ) Revert "feat: Low Precision Allreduce for PCIe based GPU (#3851)" This reverts commit `5e634dd1bd`.	2025-05-15 09:52:39 +08:00
hlu1	7fb0af9320	[fix] Remove stale cublas heuristics (#4326 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com> Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>	2025-05-14 17:35:51 -07:00
Robin Kobus	d31fefde2c	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 ) * chore: Remove GptSession/V1 from TRT workflow Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove stateful decoders Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession buffers Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession utils Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession kernels Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove V1 GPT models from tests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSessionBenchmark from scripts and docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove gptSession IO classes Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from test lists Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove GptSession from docs Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove useless encoder test Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove mActualBatchSize from DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Remove static batching from ExecutorTest - Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter. - Adjusted related test functions to reflect the changes in parameter lists. - Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-14 23:10:04 +02:00
sugunav14	7c828d767f	feat: [AutoDeploy] DSV3 mla attn ref op (#4272 ) * raw ref op + new patch untested Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * Added mla attn ref op and unit tests for attn + module patches Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * update stray changes in deepseek.py Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * Updated stale documentation Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * removed stray update in sdpa return shapes Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> --------- Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>	2025-05-15 01:58:20 +08:00
Faraz	42de79d49e	test: Added tests for Llama3.1-70B-BF16 on SM120 (#4198 ) * Added tests for Llama3.1-70B-BF16 on SM120 Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> * solve conflicts add more tests Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> --------- Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-05-14 11:57:49 -04:00
Yanchao Lu	504f4bf779	[Infra] - Update the upstream PyTorch dependency to 2.7.0 (#4235 ) [Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-14 22:28:13 +08:00
Kaiyu Xie	6c45586c51	chore: Remove deprecated Python runtime benchmark (#4171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-14 18:41:05 +08:00
HuiGao-NV	f4059c6e2e	Add test case for kv memory estimation (#4158 ) * Add test case for kv memory estimation * Dump running log into file and parse kv cache memory size from file * Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case * Revert change to usage of fraction * use context manager to guard temp files Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-14 18:39:25 +08:00
xinhe-nv	f2bfe2f84f	test: [CI] remove closed bugs (#4207 ) update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-14 17:59:05 +08:00
DylanChen-NV	206f82115d	[bug/5247505] fix: CP accuracy on Blackwell (#4188 ) * fix xqa params for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * try adding B200 multi gpu test Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> * add accuracy tests for cp Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> --------- Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-05-14 17:40:50 +08:00
Anurag Mukkara	b15f57763d	tests: PyTorch multimodal using keyword match (#4215 ) * keyword accuracy check for pytorch multimodal Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Change keywords for some prompts Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Delete full text answers Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Cleanup debug code Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> --------- Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-05-14 17:18:43 +08:00
kanghui0204	5e634dd1bd	feat: Low Precision Allreduce for PCIe based GPU (#3851 ) This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process. Signed-off-by: Hui Kang <hkang@nvidia.com> Co-authored-by: Hui Kang <hkang@nvidia.com>	2025-05-14 16:45:43 +08:00
Yiqing Yan	a66a02a75a	[Infra] Waive L0 test (#4295 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-14 16:38:33 +08:00
Barry Kang	20b42912ce	[TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123 ) Support DeepSeek-R1 W4A8 on Hopper Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-05-14 15:48:07 +08:00
Zongfei Jing	bb17649517	test: Add UT for moe trtllmgen (#4258 ) * Add ut for moe trtllmgen Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Update tests/unittest/_torch/modeling/test_modeling_deepseek.py Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>	2025-05-14 15:22:58 +08:00
bhsueh_NV	1a9298bc66	CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266 ) add fp8/fp4 ci on Qwen3-30B-A3B Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-14 14:38:04 +08:00
brb-nv	8280c3d4f2	feat: Support Gemma3-1b-it in Pytorch workflow (#3999 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 14:02:44 +08:00
Yi Zhang	86ae506b9d	[fix] Enable pp tests (#3978 ) Fix misrebase issue Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-05-14 10:51:20 +08:00
Fridah-nv	21dbd163a7	[TRTLLM-5188] fix: [AutoDeploy] unwaive AD build test (#4273 ) * unwaive small build test Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> * unwaive mutigpu/integration tests Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> * fix for torch.compile+flashinfer attention Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>	2025-05-14 10:40:12 +08:00
brb-nv	1ef117688c	test: Validate FP8 and LoRA for Gemma3 (#3670 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-13 17:28:02 -07:00
Iman Tabrizian	f408de2d99	Waive disagg kv cache load balancer test (#4276 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-14 06:03:24 +08:00
brb-nv	cd5b3d21a0	feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 03:47:22 +08:00
Yiqing Yan	290649b6aa	[Infra] Waive L0 test (#4269 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 23:06:13 +08:00
Yiqing Yan	bfa16a63d4	[Infra] Waive L0 test (#4268 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 22:43:17 +08:00
dominicshanshan	44d6adfb68	Waive stress test. (#4262 ) * Waive stress test. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-13 21:01:57 +08:00
Enwei Zhu	8f68d56cc1	[https://nvbugs/5220763 ] [test] Unwaive Mixtral FP8 TP2 test (#4252 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 15:55:33 +08:00
Yiqing Yan	fda8b0277a	[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 (#4049 ) * [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix review Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * update images Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * update image name Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-13 14:59:12 +08:00
ruodil	d555fe2530	test: fix for perf test script issue (#4230 ) fix for perf test script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-13 10:29:20 +08:00
xinhe-nv	0cebc16139	test: [CI] Add failed cases into waives.txt (#4205 ) waive tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-13 10:22:42 +08:00
xinhe-nv	7ebae4dcaa	test: [CI] Add failed cases into waives.txt (#4203 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-13 10:08:02 +08:00
pcastonguay	9643be5f20	[TRTLLM-5050][feat] Enable per-request stats with PyT backend (#4156 ) * feat: Add per-request stats support with PyT backend Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding unit test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing stats unit test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing test with overlap Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-12 21:35:15 -04:00
Simeng Liu	286a789549	feat: Add heuristic for GroupRMSNorm kernel selection. (#4047 ) * feat: Add heuristic for GroupRMSNorm kernel selection. Implements a logistic regression model to dynamically select between: - GroupRMSNormBaseKernel: Allocates warps proportional to sum of dimensions (better SM occupancy in most cases) - GroupRMSNormLargeBatch: Allocates warps proportional to max dimension (better block scheduling in large batch scenarios) Selection heuristic considers batch size, allocated warps, and scheduling efficiency on the current GPU architecture. Models for Compute Capability 9.x and 10.x are trained base on nsys kernel runtime data. The default kernel selection is the base kernel. The python operator group_rms_norm will use the heuristic by default. User can pick to use the base or large batch kernels as well. Signed-off-by: Simeng Liu <simengl@nvidia.com> * Address the comments. Signed-off-by: Simeng Liu <simengl@nvidia.com> --------- Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-05-13 08:52:53 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
Enwei Zhu	c31ca1688c	[https://nvbugs/5214229 ] [fix] Unwaive lm_head quantization case (#4222 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 20:23:06 +08:00
Zheng Duan	c9e2a963e0	feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-12 07:23:57 -04:00
Yixin Dong	c90ebadd84	feat: Support the Structural Tag in guided decoding (#4066 ) * finish Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * exc overlap scheduler Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix api ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Ubospica <ubospica@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 17:24:50 +08:00
Yechan Kim	3e9bda3a09	[feat] Support HyperCLOVAX-SEED-Text language part (#3902 ) * feat: support HyperCLOVAX-SEED-Text language part Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add Pytorch flow and remove test file Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * revert summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove from pytorch example Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-05-12 16:05:14 +08:00
Zhenhuan Chen	9212e9a740	[TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller (#4151 ) feat(scaffolding): make sampling_params only setable by controller Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-05-12 15:29:09 +08:00
Ivy Zhang	ee92edf2b4	[https://nvbugspro.nvidia.com/bug/5270564 ][test] skip per-hopper for llama4 (#4211 ) skip per-hopper for llama4 Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-12 15:27:15 +08:00
ruodil	9c03a7ab74	test: add llama_3.2_1B model and fix for test lora script issue (#4139 ) * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add llama_3.2_1B model and fix for lora script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-12 14:51:59 +08:00
xinhe-nv	849d9c343c	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 ) remove unsupported tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 14:13:40 +08:00
Yiqing Yan	3c54e84e47	[Infra] Waive L0 test (#4212 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-12 11:37:49 +08:00
QI JUN	f021afa241	[CI] waive two multi-gpu test cases (#4206 ) waive two multi-gpu test cases Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-12 08:04:48 +08:00
Enwei Zhu	7db368c72c	test: Remove CNN Dailymail tasks in favor of GSM8K (#4187 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-10 09:02:07 +08:00
Dom Brown	2d0f93a054	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 ) * Refactor: Restructure C++ tests for better modularisation of non-shared code Start cleanup of pytest code for C++ tests Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Clean up names and remove references to test_cpp.py Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Move multi-GPU code Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Update doc and try un-waiving Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update multi GPU file check Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Address minor multi-GPU setup bug Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-09 19:16:51 +01:00
Mike Iovine	4b8ba7ad61	[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069 ) [fix] Fix llama 4 test lists Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-09 22:45:14 +08:00
Tracin	446f62bbab	chore: Deprecate evaltool (#4173 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-05-09 20:31:53 +08:00
ruodil	bf5b2a2e0a	test: amend regex match for perf throughput (#4186 ) amend regex match for perf throughput Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 17:33:25 +08:00
WeiHaocheng	0f01826dde	feat: support task collection for to collect information (#3328 ) (#3824 ) Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>	2025-05-09 17:09:01 +08:00
xinhe-nv	9082411a50	test: [CI] Add failed cases into waives.txt (#4165 ) wavie oom tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-09 16:56:30 +08:00
ruodil	5ce5b81281	test: amend default pytorch extra-llm-api-config.yml in perf test (#4176 ) * amend default pytorch extra-llm-api-config.yml Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add print info to separate cases in output log Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 16:46:48 +08:00
xinhe-nv	1d26a3fd7c	test: skip tests on b200 (#3913 ) * skip tests on b200 Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip phi-3-128k Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-09 14:51:55 +08:00
Fanrong Li	77f8e43592	[fix] Fix relaxed acceptance to support enabling it in context phase (#4126 ) * fix relaxed acceptance to support enable this feature in context phase. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix sample_and_accept_draft_tokens unit test. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-09 14:11:14 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
Ivy Zhang	c2d4c2adb6	[https://nvbugspro.nvidia.com/bug/5260676 ]test: skip fp8 quantization case for pre-ada (#4095 ) skip pre ada Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:30:16 +08:00
Yi Zhang	91bf5e6a8e	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 ) Add Piecewise CUDA Graph Support Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-05-09 11:04:01 +08:00
Stanley Sun	fb31f91e15	test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-05-09 11:03:02 +08:00
Yukun He	5b61486d87	chore: Clean up the legacy DeepseekAllreudceFusionOp. (#4081 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-09 10:20:41 +08:00
pcastonguay	836c142e1b	[feat] Allow overriding cli args with yaml file in trtllm-serve (#4164 ) feat: Allow overriding cli args with yaml file in trtllm-serve Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-08 21:19:05 -04:00
Mike Iovine	9afe510367	[fix] Fix llama4 + eagle3 (#3998 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-08 19:20:27 -04:00
chenfeiz0326	7f5716ef83	Cherry-pick trtllm-gen from feat/llama4 to main (#4086 ) * feat: TRT-LLM Gen FP8 MoE Llama4 Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> * feat: TRT-LLM Gen llama4 MoE Top1 routing Signed-off-by: Jiqun Tu <jtu@nvidia.com> * feat: add per tensor FP8 TRT-LLM Gen GEMMs Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> * Update Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Update Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Add guard for routingIndicesClusterKernel Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Guard sm90+ for routingkernels Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Guard sm90+ for routingkernels Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> --------- Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> Signed-off-by: Jiqun Tu <jtu@nvidia.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Co-authored-by: Nikita Korobov <nkorobov@nvidia.com> Co-authored-by: Jiqun Tu <jtu@nvidia.com>	2025-05-08 14:13:01 -07:00
shaharmor98	7d94c9561f	feat: support multi lora adapters and TP (#3885 ) * support multi lora, tp Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-05-08 23:45:45 +08:00
Yuan Tong	5b93273156	feat: adopt new logprob definition in PyTorch flow (#4057 ) feat: align logprob definition of PyTorch flow Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>	2025-05-08 20:16:40 +08:00
Enwei Zhu	74df12bbaa	[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval (#3946 ) * fix formula Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update doc Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * 1st version Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * polish Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-08 19:35:23 +08:00
Ivy Zhang	7666bec7c4	[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053 ) * add MMLU, GPQADiamond check for llama-4 models Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add nomotron cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add online quant test cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove trt flow cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust parallelism strategy Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix fail Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update sanity list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix comment Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * skip nemotron-h test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:10:41 +08:00
xinhe-nv	4468158be4	test: [CI] remove closed bugs (#4046 ) update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-08 18:04:43 +08:00
Yiqing Yan	ce8832e80f	[Infra] Waive L0 flaky test (#4148 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-08 17:23:45 +08:00
yuanjingx87	6e1d2a1320	feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI (#4019 ) * Add slurm support with RTXPro6000 PostMerge Tests Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> * remove H100 post merge test from testing Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> --------- Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-08 15:15:36 +08:00
Enwei Zhu	dae6781494	test: Waive disagg accuracy test (#4124 ) * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-08 13:39:07 +08:00
Enwei Zhu	19eeef1ddd	test: Waive test_llm cases (#4136 ) * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-08 11:53:16 +08:00
Ivy Zhang	d7c51c953b	test: add INTEGRATION_TEST env var to speed up integration test (#3618 ) add INTEGRATION_TEST env var Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-08 10:44:50 +08:00
ruodil	4d0e462723	tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864 ) * tests: skip writing prepare_dataset output to logs Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-07 13:56:35 +08:00
Yan Chunwei	0c26059703	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 ) * beam_width and max_new_token Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove beam_width Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove min_length Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove return_num_sequences Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-07 13:20:25 +08:00
Enwei Zhu	c28b90984f	[TRTLLM-3925, https://nvbugs/5245262 ] [fix] Normalize LLM.generate API (#3985 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-07 11:06:23 +08:00
Venky	62fea1e885	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>	2025-05-06 17:17:55 -07:00
Erin	cba1793cda	cleanup logprob params (#4039 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-05-07 00:50:16 +08:00
Robin Kobus	72057a0a64	[TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625 ) * disable overlap in encoder Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * feat: invokeGatherBatch Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * feat: overlap same batch Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: add enableTrtOverlap to ExecutorConfig Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * disable overlap for beam search and spec decode Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * skip overlap tests with beam search or speculative decoding Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * moveFinishedContextRequestsToGeneration and skip unfinished requests in updateRequests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * enable overlap in GptChunkedLongContextTests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * feat: Enable overlap in gptManagerBenchmark Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * feat: Improve early exit Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Use OptionalRef for newOutputTokens tensor Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * feat: Add overlap scheduling support to TRTLLMDecoder - Updated TRTLLMDecoder to accept an `enable_overlap_scheduler` parameter. - Modified the decoder's internal logic to utilize the overlap scheduling feature. - Adjusted the sequence lengths handling to ensure compatibility with the new scheduling approach. - Enhanced unit tests to include cases for the overlap scheduler with the TRTLLMDecoder. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: allNewTokens in PP Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-06 15:06:46 +02:00
dominicshanshan	3ac6637005	fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836 ) * Remove stdout pipe for genai-perf and make stress time as public parameter. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Update llmRequest based on comment. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * launch process function refactor. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-06 16:52:30 +08:00
Jinyang Yuan	eeb6c0c83f	[fix] Loosen the thresholds of test_attention_mla (#4074 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-05-06 11:31:09 +08:00
Suyog Gupta	ac2ab9ba36	[AutoDeploy][perf] Further optimize flashinfer backend in AutoDeploy (#4024 ) * reuse batch_indices, positions across layers Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> * fix flashinfer unit tests Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> * simplify call to get_batch_indices_positions Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> * fix call to get_batch_indices_positions Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> --------- Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-05-06 10:46:36 +08:00
bhsueh_NV	5c0f554b9e	doc: update qwen3 document (#4073 ) * update qwen3 document Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove useless codes Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-06 08:42:51 +08:00
bhsueh_NV	e053cb651b	Fix: fix bug of qwen3 moe (#4058 ) * fix bug of qwen3 moe Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * update threshold Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-06 08:20:15 +08:00
pansicheng	e84dc6b3c7	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 ) * add deepseek-r1 reasoning parser Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> * fix test Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> --------- Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-05-06 08:13:04 +08:00
Iman Tabrizian	85867d76dd	test: Add disaggregated serving accuracy tests (#4036 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-05 08:56:59 -07:00

1 2 3 4 5 ...

499 Commits