TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
ruodil	0f4ed90560	[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-30 02:39:50 -05:00
xinhe-nv	3e0344a53d	[None][chore] Add failed cases into waives.txt (#10301 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 14:04:28 +08:00
xinhe-nv	48fee8d0f6	[None][chore] Add failed cases into waives.txt (#10321 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-30 00:11:49 -05:00
Emma Qiao	f396ad83b0	[None][infra] Remove duplicates in waives.txt (#10333 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-29 22:32:52 -05:00
Balaram Buddharaju	4944192eae	[None][chore] Waive tests failing in pre-merge 12/28 (#10311 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-29 20:53:49 -05:00
Neta Zmora	966231d29c	[#9626 ][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels (#10304 ) Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused. Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128, so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-29 23:18:15 +02:00
Yueh-Ting (eop) Chen	9cee32ab39	[https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager (#10183 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-12-29 14:29:14 +08:00
Yanchao Lu	2f8d6d25a8	[None][ci] Waive an intermittent test hang case (#10324 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-29 13:04:31 +08:00
Yanchao Lu	270be801aa	[None][ci] Move remaining DGX-B200 tests to LBD (#9876 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-28 13:55:39 +08:00
JunyiXu-nv	55bc6a5ff8	[https://nvbugs/5753250 ][fix] Fix undefined local variable in responses utils (#10154 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-28 06:59:32 +08:00
shivghai	ee07a7c55e	[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 (#9961 ) Signed-off-by: Shiv Ghai <8965168+shivghai@users.noreply.github.com>	2025-12-27 11:50:59 -08:00
Guoming Zhang	93ac0bc1dc	[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… (#10229 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-27 22:48:10 +08:00
Jin Li	c04563657e	[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-27 00:07:20 +08:00
chenfeiz0326	d70aeddc7f	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-26 22:50:53 +08:00
Pengyun Lin	684b37df02	[https://nvbugs/5747938 ][fix] Use local tokenizer (#10230 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 22:08:10 +08:00
Pengyun Lin	c5b0f9e436	[https://nvbugs/5633700 ][fix] Cache tiktoken vocab for gpt-oss (#10219 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-26 18:39:03 +08:00
dongfengy	bfc591994c	[https://nvbugs/5745152 ][fix] Fix some GPTOSS test setups (#10085 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-12-26 17:52:40 +08:00
Neta Zmora	f3f02315df	[None][chore]: small refactoring to auto-deploy MoE operator (#10300 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-25 12:27:11 -05:00
bhsueh_NV	db3430f589	[None][feat] Support VLM part for Mistral Large 3 (#10188 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-25 11:20:58 -05:00
Ziyi Xiong	d8b5aeb061	[https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens (#10160 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-25 09:13:51 -05:00
ZhichenJiang	46e4af5688	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 09:04:20 -05:00
Lizhi Zhou	fe12faef81	[https://nvbugs/5752516 ][chore] unwaive test; fix port conflicts in CI (#10152 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-25 08:16:09 -05:00
Emma Qiao	0ecdb69b93	[None][infra] Waive failed tests for main on 12/25 (#10298 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-25 05:22:39 -05:00
Jie Li	83e02ee335	[None][chore] Remove NIM TRT-Backend Test Lists (#10232 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2025-12-25 04:01:51 -05:00
Enwei Zhu	182b3eb633	[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout (#10293 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 02:35:18 -05:00
Gabriel Wu	1d01214ff0	[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>	2025-12-25 14:52:52 +08:00
xinhe-nv	4ae6f6a46c	[None][chore] Add failed cases into waives.txt (#10249 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-25 01:26:21 -05:00
Venky	c059e6caa1	[TRTC-121] [feat] Add recipe selector UI to complement the recipe database (#10125 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-12-24 23:56:54 -05:00
gramnarayan	a9eb5afc9f	[#9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding (#9869 ) Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-24 23:30:42 -05:00
Emma Qiao	16fd781e42	[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge (#9897 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-24 21:45:33 -05:00
Neta Zmora	c4b36d31ff	[#10137 ][feat] AutoDeploy FP8 MoE refactor (#10138 ) The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time. The Cutlass MoE kernel is used thru a trtllm torch operator. Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-24 18:58:10 +02:00
Stanley Sun	ddac4d7379	[None][test] Add disag-serving auto scaling qa test (#10262 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-12-24 08:43:47 -05:00
shuyixiong	f4f0fe85e9	[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-24 15:27:01 +08:00
xinhe-nv	534700ecd9	[None][chore] Add failed cases into waives.txt (#10240 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-24 02:21:50 -05:00
Fanrong Li	156f6453dc	[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 (#10226 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-24 14:39:12 +08:00
Emma Qiao	7b84e48e0f	[None][infra] Waive failed cases om 12/24 (#10257 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 22:49:57 -05:00
xinhe-nv	fc1f77eafc	[None][chore] Add failed cases into waives.txt (#10204 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2025-12-24 10:37:23 +08:00
Balaram Buddharaju	8c1cfc872b	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 18:14:30 -08:00
Jhao-Ting Chen	92d90fa29a	[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS (#10018 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski	0027a01ad5	[https://nvbugs/5680312 ][fix] Updated test waiving (#9630 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 09:38:12 -08:00
Emma Qiao	984c20e0b2	[None][infra] Waive failed cases on 12/23 (#10236 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 08:48:54 -05:00
dongfengy	e284d0bf80	[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests (#10209 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 07:43:13 -05:00
Yukun He	522f1d2bc3	[https://nvbugs/5764627 ][chore] waive the time-out test (#10222 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-23 16:36:06 +08:00
Balaram Buddharaju	f2e00a75de	[None][chore] Remove helix test from rtx test list (#10224 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 03:07:37 -05:00
Shiyu Li	3ddc9d2b48	[https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-12-23 16:07:07 +08:00
chenfeiz0326	48c875f8ea	[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test (#9990 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-23 16:02:38 +08:00
Bo Li	cc1323be24	[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. (#10197 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-23 02:13:37 -05:00
Chuang Zhu	53db3b2612	[https://nvbugs/5741884 ][fix] unwaive disagg sampler (#10189 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-23 14:38:07 +08:00
xinhe-nv	77b591f73b	[None][chore] Add failed cases into waives.txt (#10177 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <lijie@nvidia.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-23 13:43:50 +08:00
Harshini Komali	d691371eaf	[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310 ) Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-23 13:25:55 +08:00

1 2 3 4 5 ...

2413 Commits