TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Zheng Duan	b0cb963199	test: torch-flow conditional disagg test (#3410 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-15 10:54:14 +08:00
nv-guomingz	b32ae7ac92	test:add fp8_kv_cache functionality test case. (#3457 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-15 09:16:46 +08:00
Iman Tabrizian	bad55e99bb	test: Add MTP + overlap + Attention DP disaggregated test (#3542 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-15 07:46:03 +08:00
Pamela Peng	6cdfc54883	feat: Add FP8 support for SM 120 (#3248 ) * Allow FP8 on SM120 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix sm121 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix pre-commit Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * review update Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> --------- Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-04-14 16:05:41 -07:00
Ivy Zhang	170bc22139	fix test name (#3534 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-14 17:09:50 +08:00
xinhe-nv	b1d8495b3d	update waive list (#3510 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-14 15:24:48 +08:00
bhsueh_NV	9d7d48faeb	fix: disable the kv cache reuse for prompt tuning test (#3474 ) * disable the kv cache reuse for prompt tuning test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * unwaive the wavied tests Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-14 14:35:47 +08:00
brb-nv	44090a5388	Add support for Phi-4-MM (#3296 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-14 14:24:10 +08:00
Yiqing Yan	19d296b4b2	chore: add dgx_h200 tests (#3451 ) * add dgx_h200 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix pre-commit Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * change bsl branch Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * change multi gpu related file list Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-14 11:20:55 +08:00
pcastonguay	fe6f14b2b1	fix: Fixing issue with first gen token being returned twice in streaming (#3427 ) * fix: Fixing issue with first gen token being returned twice with streaming Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing not_expectring_strings in test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-13 22:45:09 -04:00
Yiqing Yan	65d1591fbf	Waive L0 test (#3508 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-14 09:32:01 +08:00
Chuang Zhu	6ee021a90d	chore: exchange connection id with tagSend/tagRecv (#3320 ) * exchange connection id with tagSend/tagRecv Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * unwaive Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * tag recv/send Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-14 09:30:34 +08:00
dominicshanshan	5d3180be82	feat: Add stress test for TRT-LLM (#3250 ) Signed-off-by: Wangshanshan <dominicw@nvidia.com>	2025-04-13 10:24:25 +08:00
pcastonguay	145a126a28	chore: Unwaive DS + overlap disagg test (#3339 ) * chore: Unwaive DS + overlap disagg test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-12 13:33:38 -04:00
yuxianq	29c5085400	fix: Fix PP for llama. (#3449 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-12 17:20:27 +08:00
Iman Tabrizian	3041bbdab3	fix: Fix disagg MTP with overlap (#3406 ) * fix: disagg overlap with MTP Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Review comment Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-12 12:27:24 +08:00
HuiGao-NV	c51e90d7d7	fix: don't perform memory estimation for start_attention (#3485 ) * fix: don't perform memory estimation for start_attention * Enable tests of unittest/_torch/multi_gpu Signed-off-by: Hui Gao <huig@nvidia.com>	2025-04-12 11:34:46 +08:00
Enwei Zhu	5e2923bb92	test: Automatically clean checkpoints and engines (#3468 ) * auto clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix tempdir Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-12 09:56:29 +08:00
Enwei Zhu	cf9ceea890	test: Add DeepSeek-V3-Lite PP=4 cases (#3454 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-12 00:09:12 +08:00
Shunkangz	ea050084ad	feat: Add support of chat completion in PD (#2985 ) * Add support of chat completion in PD Add support of include_usage in PD Reformat * Remove redundant code Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor code Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Add chat completion test Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor code Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> --------- Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-11 17:53:28 +08:00
Ivy Zhang	20e54e5c89	test: add cuda visible device constraint for phi_1gpu test (#3364 ) Signed-off-by: Ivy Zhang <yanzh@nvidia.com>	2025-04-11 17:14:52 +08:00
Ivy Zhang	d998832b33	test: add torch flow test case in qa test list (#3404 ) Signed-off-by: Ivy Zhang <yanzh@nvidia.com>	2025-04-11 16:57:41 +08:00
Yiqing Yan	0d351317c2	Waive failure post-merge tests (#3472 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-11 16:23:07 +08:00
Enwei Zhu	410f56357e	test: Waive torch compile tests (#3471 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-11 13:38:05 +08:00
QI JUN	16ca45747b	always trigger multi gpu test to protect modeling_llama.py and modeling_deepseekv3.py (#3434 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-11 13:19:23 +08:00
QI JUN	1e2a339642	waive unittest/_torch/multi_gpu (#3464 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-11 09:59:16 +08:00
QI JUN	6cef10068a	waive a test case of llama 3.1 with torch compile (#3461 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-11 09:15:19 +08:00
Iman Tabrizian	d7f45e50c6	test: disable attention DP tests for single GPU (#3395 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-11 01:38:17 +08:00
amitz-nv	a6a2ae6cc1	chore: Rename nvsmall to nemotron nas (#3447 ) * Rename nvsmall to nemotron NAS * Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests * Add NemotronNAS to pytorch supported models table Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-04-10 23:16:52 +08:00
wm2012011492	af05749e90	feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… (#3369 ) * add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa_llmapi.py Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> * fix coding style Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> * add unittest Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> --------- Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> Co-authored-by: mengw <12670782+wm2012011492@users.noreply.github.com>	2025-04-10 22:45:57 +08:00
QI JUN	f5281fffaa	waive some test cases of test_llm_multi_gpu.py (#3452 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-10 22:02:35 +08:00
Yiqing Yan	10d2d16247	Waive L0 test (#3442 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-10 17:43:45 +08:00
Emma Qiao	5023e0d0f4	infra: Update some test description which is out of date (#3437 ) * Update some description which is out of date Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-10 17:29:30 +08:00
bhsueh_NV	cec65bd09a	clean the waive.txt (#3441 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-10 16:20:08 +08:00
brb-nv	c59abae436	feat: Add Gemma3 text-only model support (#3247 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-10 12:34:58 +08:00
QI JUN	b5473f7eca	waive llama3.1 8B test cases with pipeline parallelism (#3433 ) * waive llama3.1 8B test cases with pipeline parallelism Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-10 11:07:58 +08:00
peaceh-nv	215fb20567	chore : split GptExecutor tests out of gpt tests to reduce single test time (#3412 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-10 09:08:15 +08:00
Yechan Kim	943218b54a	feat: Add Qwen2.5-VL and refactor Qwen2-VL (#3156 ) * feat: Add Qwen2.5-VL and refactor Qwen2-VL Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix yapf and codespell Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test_e2e Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * generalize get_rope_index Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix qwen2.5-vl on REAME Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix image test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-10 04:09:03 +08:00
Iman Tabrizian	8401722245	test: Add single gpu disaggregated tests (#3295 ) * test: Add single gpu disaggregated tests Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add deepseek with overlap tests Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Use updated prompt Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Move test to disaggregated folder Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-09 09:34:45 +08:00
Mike Iovine	5bdf997963	Add Llama 4 (#3302 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-04-09 03:35:21 +08:00
yuxianq	7225bd8b91	chore: Refine attention backend interface. (#3271 ) Refine attention backend interface. Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-09 02:34:53 +08:00
wili	54ad95eaa8	Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338 ) * feat/Variable-Beam-Width-Search-Part3, v1.0 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat/Variable-Beam-Width-Search-Part3, v1.1 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat/Variable-Beam-Width-Search-Part3, v1.2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>	2025-04-08 23:51:27 +08:00
pcastonguay	02f446a9ff	chore: Adding DS V3-lite tests with overlap + cuda graph (#3342 ) * chore: Adding DS V3-lite tests with overlap + cuda graph Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-08 09:36:09 -04:00
yuxianq	7b03350527	Add thread leak check and fix thread/memory leak issues. (#3270 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-08 19:03:18 +08:00
Chuang Zhu	cdb0906be4	disagg test single h100 (#3353 )	2025-04-08 17:45:35 +08:00
amirkl94	e04f6a1b9b	fix: Fix p-tuning test bug (#3326 ) * fix: Fix p-tuning test bug * A change in the vocab_size calculation for T5Tokenizer, introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning. In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added. Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-04-08 17:14:00 +08:00
Enwei Zhu	8ee019f8c4	test: Accuracy test improvement (Part 3.4): Move LLaMA tests (#3350 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-08 15:07:57 +08:00
MinaHuai	31422e7e46	add tp=2 ci test for vision encoder (#3319 ) Signed-off-by: mhuai <mhuai@nvidia.com>	2025-04-07 21:46:08 -07:00
Gabriel Wu	42c8574e93	fix: revert extra cmake var (#3351 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-08 11:57:16 +08:00
pcastonguay	add5e5cd93	feat: Add option to run disaggregated serving without ctx servers,… (#3243 ) * feat: Add option to run disaggregated serving without ctx servers, to benchmark gen only Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing comment in sanity check Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-07 21:56:03 -04:00

1 2

100 Commits