TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
bhsueh_NV	d5606b062a	fix: [https://nvbugs/5355219 ] Fix bug of Qwen3 235B CI on dgx_gb200 (#5602 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-02 10:07:01 +08:00
Kaiyu Xie	682b164b9b	doc: Fix outdated config in DeepSeek best perf practice doc (#5638 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-01 04:58:50 -04:00
Yi Zhang	aa0b9278d2	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-01 01:06:47 -04:00
Zheng Duan	1824c44004	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-01 10:48:25 +08:00
nv-guomingz	9fe1dd6be1	fix:https://nvbugs/5362398 (#5609 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 13:29:40 -04:00
Yan Chunwei	d6c81bad97	fix [nvbug5351244]: test_mpi_session submit sync/async (#5608 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 00:48:59 +08:00
Emma Qiao	647e070ed6	[Infra][release/0.21]Update nccl to 2.27.5 (#5539 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-29 20:50:15 +08:00
Venky	4fc0666daa	[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]` (#5553 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-28 01:15:04 +08:00
ixlmar	abb7357f25	[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-27 07:09:41 -07:00
Yan Chunwei	b78ad754c8	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-27 14:10:45 +08:00
Emma Qiao	e2054bb2aa	[Infra][release/0.21] - waive failed tests (#5537 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-27 13:58:13 +08:00
ixlmar	312fd47f84	fix: constrain grepping in docker/Makefile (#5493 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-26 13:44:40 +02:00
Kaiyu Xie	30a2a8b81c	doc: Fix benchmark cmd in disagg scripts (#5516 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-26 17:23:24 +08:00
ixlmar	a811077f90	fix: fix regression in LOCAL_USER (#5517 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-26 11:10:55 +02:00
Anurag Mukkara	c2799d0465	[nvbug/5354825] Fix nougat test image url (#5496 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-06-26 10:10:18 +08:00
Yan Chunwei	87ead4ecbe	[nvbug 5273941] fix: broken cyclic reference detect (#5417 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-26 07:35:35 +08:00
Martin Marciniszyn Mehringer	fc64f139e4	Fix permission for local user issues in NGC docker container. (#5373 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-06-25 14:10:20 +02:00
Emma Qiao	b6d23d58c4	[Infra] - Waive failed tests on release/0.21 (#5477 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-25 19:01:55 +08:00
HuiGao-NV	5cd87bee41	tests: Set kv cache free memory fraction in test case (#5462 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-25 16:27:46 +08:00
ruodil	5e50fcc51b	test: set enable_attention_dp=True in default deepseek settings (#5461 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-25 14:21:14 +08:00
Wanli Jiang	af5839303d	feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-06-25 14:10:50 +08:00
brb-nv	32f50ded17	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-25 11:45:14 +08:00
Ivy Zhang	9e110b2d11	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-25 10:42:34 +08:00
Kaiyu Xie	2b56957fb5	Fix: missing clientId when serialize and deserialize response (cherry-pick #5231 ) (#5378 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-24 10:00:37 +08:00
Yi Zhang	2d5e202484	fix: Fix skip by mpi size fixture (#5355 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-22 02:51:01 +08:00
Martin Marciniszyn Mehringer	ebc6dbcb0b	doc: cherry pick #5334 (#5368 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-06-19 20:03:59 +08:00
Emma Qiao	8686805a3b	[Infra]cherry pick sanity check yml change for 5080 and 5090 from main (#5363 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-19 15:33:57 +08:00
ruodil	e87cf62c12	tests: cherry-pick from main branch, add qwen3 test cases and amend test name in perf test (#5357 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-19 14:34:05 +08:00
Yiqing Yan	decfe2fdb3	chore: bump version to 0.21.0 (#5325 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:58:44 +08:00
Yiqing Yan	da576bcafa	Waive L0 test (#5349 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-19 12:01:11 +08:00
Fanrong Li	6c3210a8be	[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-19 09:48:22 +08:00
nv-guomingz	6a388b105a	chore: remove torch_compile prefix for TorchCompileConfig field members (#5261 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-19 09:21:51 +08:00
Zongfei Jing	2b23cd56ce	[feat] Fusion finalize and allreduce for qwenmoe model (#5223 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>	2025-06-19 08:03:58 +08:00
Robin Kobus	1a7c6e7974	ci: Split long running jobs into multiple jobs (#5268 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-19 06:24:29 +08:00
Yan Chunwei	3946e798db	fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-19 06:13:53 +08:00
Omer Ullman Argov	0b6d005ef6	[fix][test] clear cuda cache before unittests automatically (#5121 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 00:36:53 +03:00
Aurelien Chartier	d25f93c07f	chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-06-18 11:13:12 -07:00
Omer Ullman Argov	5010f8719d	[fix][test] remove duplicate test runs (#5241 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-19 01:59:54 +08:00
Omer Ullman Argov	a28a152001	[fix][test] remove some cpp test cases from h100 (#5335 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-18 20:40:26 +03:00
yuanjingx87	a1c5704055	[feat] Multi-node CI testing support via Slurm (#4771 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-19 01:11:12 +08:00
Iman Tabrizian	e5ee5c5352	Unwaive disaggregated serving accuracy tests (#5095 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-06-19 00:41:15 +08:00
Xianjie Qiao	857108aeca	Add disagg slurm scripts (#5243 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-06-18 23:17:55 +08:00
HuiGao-NV	d13d2f460d	Remove duplicated test cases (#5323 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-06-18 21:20:20 +08:00
juney-nvidia	00bdd39b96	chore: Update README.md to expose meet-up info (#5329 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-06-18 20:04:28 +08:00
Emma Qiao	b29ac5b561	[Infra] Update 5080 and 5090 case condition due to the driver update (#5317 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-18 20:01:36 +08:00
jellysnack	0623ffe3bc	feat: Add LLGuidance Support for PyTorch Backend (#5214 ) Signed-off-by: jellysnack <oleg.jellysnack@gmail.com> Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-06-18 19:33:34 +08:00
xinhe-nv	610a49f117	tests: add multi nodes tests (#5196 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-18 18:08:04 +08:00
Yi Zhang	375dd0b971	Waive L0 (#5311 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-06-18 16:40:41 +08:00
Yiqing Yan	a3a48410f3	Fix rerun step (#5319 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-18 16:38:45 +08:00
Yuan Tong	f599ee63c1	test: correct unittest rerun behavior (#5273 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-06-18 16:37:19 +08:00

1 2 3 4 5 ...

1469 Commits