TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-24 20:52:48 +08:00

Author	SHA1	Message	Date
Robin Kobus	fd94d3cbf5	[nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-09 17:59:45 +02:00
ruodil	cbcc55e073	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-09 15:36:22 +10:00
Bo Li	6d7a2cb1c5	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-08 18:12:48 +08:00
QI JUN	f8b4077654	[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP (#5789 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-08 15:39:27 +09:00
Bo Li	6062dc675f	fix: [https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. (#5606 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-08 13:11:08 +09:00
QI JUN	3a58db88c8	fix _pad_attention_dp_dummy_request (#5583 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-07 14:13:54 +08:00
Pengyun Lin	7524c77e1e	[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-07 15:06:49 +09:00
ruodil	6103466de2	test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-07 13:11:41 +10:00
Iman Tabrizian	518915b5c6	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-04 12:52:35 -04:00
Yi Zhang	5ac92bb8ff	[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 23:23:41 +09:00
Yi Zhang	53394e0030	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 13:26:53 +09:00
brb-nv	2b66fe8fbd	[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-04 10:55:34 +08:00
Faraz	8a8d2e9901	[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5651 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>	2025-07-03 22:08:15 +09:00
brb-nv	a3c0cf02ce	fix: Investigate Gemma3 1B decoder output discrepancy (#5564 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-03 09:55:25 +08:00
bhsueh_NV	d5606b062a	fix: [https://nvbugs/5355219 ] Fix bug of Qwen3 235B CI on dgx_gb200 (#5602 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-02 10:07:01 +08:00
Yi Zhang	aa0b9278d2	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-01 01:06:47 -04:00
Zheng Duan	1824c44004	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-01 10:48:25 +08:00
Venky	4fc0666daa	[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5553 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-28 01:15:04 +08:00
Yan Chunwei	b78ad754c8	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-27 14:10:45 +08:00
Emma Qiao	b6d23d58c4	[Infra] - Waive failed tests on release/0.21 (#5477 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-25 19:01:55 +08:00
HuiGao-NV	5cd87bee41	tests: Set kv cache free memory fraction in test case (#5462 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 16:27:46 +08:00
ruodil	5e50fcc51b	test: set enable_attention_dp=True in default deepseek settings (#5461 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-25 14:21:14 +08:00
brb-nv	32f50ded17	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-25 11:45:14 +08:00
Ivy Zhang	9e110b2d11	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-25 10:42:34 +08:00
Yi Zhang	2d5e202484	fix: Fix skip by mpi size fixture (#5355 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-22 02:51:01 +08:00
Emma Qiao	8686805a3b	[Infra]cherry pick sanity check yml change for 5080 and 5090 from main (#5363 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:33:57 +08:00
ruodil	e87cf62c12	tests: cherry-pick from main branch, add qwen3 test cases and amend test name in perf test (#5357 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:34:05 +08:00
Yiqing Yan	da576bcafa	Waive L0 test (#5349 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:01:11 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
nv-guomingz	6a388b105a	chore: remove torch_compile prefix for TorchCompileConfig field members (#5261 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-19 09:21:51 +08:00
Aurelien Chartier	d25f93c07f	chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-06-18 11:13:12 -07:00
Omer Ullman Argov	5010f8719d	[fix][test] remove duplicate test runs (#5241 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 01:59:54 +08:00
Omer Ullman Argov	a28a152001	[fix][test] remove some cpp test cases from h100 (#5335 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 20:40:26 +03:00
yuanjingx87	a1c5704055	[feat] Multi-node CI testing support via Slurm (#4771 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-19 01:11:12 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
HuiGao-NV	d13d2f460d	Remove duplicated test cases (#5323 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 21:20:20 +08:00
Emma Qiao	b29ac5b561	[Infra] Update 5080 and 5090 case condition due to the driver update (#5317 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-18 20:01:36 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Yuan Tong	f599ee63c1	test: correct unittest rerun behavior (#5273 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-06-18 16:37:19 +08:00
Wanli Jiang	3a02489e86	[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-18 15:12:49 +08:00
QI JUN	9ea7bb67a4	CI: fix TensorRT H200 tests (#5301 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 14:40:57 +08:00
ruodil	3b5d916250	test: cherry-pick deepseek rcca cases in main branch (#5307 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-18 14:26:26 +08:00
Yiqing Yan	8f67e3604d	Waive L0 tests (#5308 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-18 12:43:45 +08:00
Omer Ullman Argov	f501ce57b1	[fix][test] move deepseek single gpu tests to post merge (#5280 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 06:59:39 +03:00
dominicshanshan	3c0fecbf42	CI: extend model weights load time for dsv3 in stress test. (#5275 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-06-18 11:51:48 +08:00
Ivy Zhang	41cfcaa964	test: update qa test list (#5305 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-18 11:29:11 +08:00
Emma Qiao	ff32caf4d7	[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 23:48:34 +08:00
Yanchao Lu	f4cdbfcdf0	None - Some clean-ups for the automation pipeline (#5245 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-17 21:08:24 +08:00
amirkl94	8451a87742	chore: Mass integration of release/0.20 (#5082 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-17 14:32:02 +03:00

1 2 3 4 5 ...

493 Commits