TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
bhsueh_NV	d5606b062a	fix: [https://nvbugs/5355219 ] Fix bug of Qwen3 235B CI on dgx_gb200 (#5602 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-02 10:07:01 +08:00
Yi Zhang	aa0b9278d2	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-01 01:06:47 -04:00
Zheng Duan	1824c44004	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-01 10:48:25 +08:00
nv-guomingz	9fe1dd6be1	fix:https://nvbugs/5362398 (#5609 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 13:29:40 -04:00
Yan Chunwei	d6c81bad97	fix [nvbug5351244]: test_mpi_session submit sync/async (#5608 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 00:48:59 +08:00
Venky	4fc0666daa	[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5553 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-28 01:15:04 +08:00
Yan Chunwei	b78ad754c8	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-27 14:10:45 +08:00
Emma Qiao	e2054bb2aa	[Infra][release/0.21] - waive failed tests (#5537 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-27 13:58:13 +08:00
Yan Chunwei	87ead4ecbe	[nvbug 5273941] fix: broken cyclic reference detect (#5417 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-26 07:35:35 +08:00
Emma Qiao	b6d23d58c4	[Infra] - Waive failed tests on release/0.21 (#5477 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-25 19:01:55 +08:00
HuiGao-NV	5cd87bee41	tests: Set kv cache free memory fraction in test case (#5462 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 16:27:46 +08:00
ruodil	5e50fcc51b	test: set enable_attention_dp=True in default deepseek settings (#5461 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-25 14:21:14 +08:00
brb-nv	32f50ded17	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-25 11:45:14 +08:00
Ivy Zhang	9e110b2d11	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-25 10:42:34 +08:00
Yi Zhang	2d5e202484	fix: Fix skip by mpi size fixture (#5355 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-22 02:51:01 +08:00
Emma Qiao	8686805a3b	[Infra]cherry pick sanity check yml change for 5080 and 5090 from main (#5363 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:33:57 +08:00
ruodil	e87cf62c12	tests: cherry-pick from main branch, add qwen3 test cases and amend test name in perf test (#5357 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:34:05 +08:00
Yiqing Yan	da576bcafa	Waive L0 test (#5349 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:01:11 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
nv-guomingz	6a388b105a	chore: remove torch_compile prefix for TorchCompileConfig field members (#5261 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-19 09:21:51 +08:00
Yan Chunwei	3946e798db	fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-19 06:13:53 +08:00
Omer Ullman Argov	0b6d005ef6	[fix][test] clear cuda cache before unittests automatically (#5121 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 00:36:53 +03:00
Aurelien Chartier	d25f93c07f	chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-06-18 11:13:12 -07:00
Omer Ullman Argov	5010f8719d	[fix][test] remove duplicate test runs (#5241 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 01:59:54 +08:00
Omer Ullman Argov	a28a152001	[fix][test] remove some cpp test cases from h100 (#5335 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 20:40:26 +03:00
yuanjingx87	a1c5704055	[feat] Multi-node CI testing support via Slurm (#4771 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-19 01:11:12 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
HuiGao-NV	d13d2f460d	Remove duplicated test cases (#5323 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 21:20:20 +08:00
Emma Qiao	b29ac5b561	[Infra] Update 5080 and 5090 case condition due to the driver update (#5317 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-18 20:01:36 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Yuan Tong	f599ee63c1	test: correct unittest rerun behavior (#5273 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-06-18 16:37:19 +08:00
Robin Kobus	38547b92f3	refactor: Introduce ResourceManagerType enum for resource management (#5246 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-18 09:55:59 +02:00
Wanli Jiang	3a02489e86	[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-18 15:12:49 +08:00
QI JUN	9ea7bb67a4	CI: fix TensorRT H200 tests (#5301 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 14:40:57 +08:00
ruodil	3b5d916250	test: cherry-pick deepseek rcca cases in main branch (#5307 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-18 14:26:26 +08:00
Yiqing Yan	8f67e3604d	Waive L0 tests (#5308 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-18 12:43:45 +08:00
Omer Ullman Argov	f501ce57b1	[fix][test] move deepseek single gpu tests to post merge (#5280 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 06:59:39 +03:00
dominicshanshan	3c0fecbf42	CI: extend model weights load time for dsv3 in stress test. (#5275 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-06-18 11:51:48 +08:00
Ivy Zhang	41cfcaa964	test: update qa test list (#5305 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-18 11:29:11 +08:00
Emma Qiao	ff32caf4d7	[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 23:48:34 +08:00
QI JUN	f899c4d294	Re-implement LlmResponse in Python to reduce host overhead of pybind (#5224 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 21:28:09 +08:00
Yanchao Lu	f4cdbfcdf0	None - Some clean-ups for the automation pipeline (#5245 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 21:08:24 +08:00
Dom Brown	44fb3c1673	[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Pytorch workflow kernel autotuner (#5207 ) - Adds a new Python custom op (fp8_block_scale_moe_runner) and a FP8BlockScaleMoERunner class for autotuning. - Updates C++ MoE and batched GEMM kernels to accept a configIndex for workspace sizing and execution. - Extends the unit test to run both autotuned and non-autotuned code paths. Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-06-17 21:01:56 +08:00
amirkl94	8451a87742	chore: Mass integration of release/0.20 (#5082 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 14:32:02 +03:00
liji-nv	13eef642e6	[feat] Piecewise cuda graph support for MLA (#4467 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-17 18:58:38 +08:00
QI JUN	ccd9adbe33	CI: move multi-gpu test cases of tensorrt backend to h200 (#5272 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:37:37 +08:00
Ivy Zhang	2ad8758ecc	[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases (#5073 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 17:14:01 +08:00
QI JUN	517c1ecf72	move some test cases of TensorRT backend back (#5232 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-17 17:03:11 +08:00
qsang-nv	134cb66a53	fix mla test (#5240 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>	2025-06-17 15:26:25 +08:00

1 2 3 4 5 ...

773 Commits