TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-17 00:04:57 +08:00

Author	SHA1	Message	Date
Gal Hubara-Agam	2b60cc181c	[#10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE (#11322 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-09 10:07:37 -05:00
Lizhi Zhou	540fb0f29e	[https://nvbugs/5834212 ][chore] unwaive test_disaggregated_mixed (#11372 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 09:16:25 -05:00
Robin Kobus	b3e4ddc953	[None][test] Enhance multi-GPU tests for IFB stats (#11239 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:25:32 +08:00
Robin Kobus	31db399042	[https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver (#11354 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:11:45 +08:00
Bo Li	ab73f6ebc6	[None][chore] Add microbench for MoE Comm methods. (#10317 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-02-09 02:57:01 -05:00
Yihan Wang	635d65f9fe	[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch (#11168 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-02-09 13:57:57 +08:00
Emma Qiao	ad8f6748a3	[None][infra] Waive failed case for main branch on 02/09 (#11369 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-08 23:05:33 -05:00
TensorRT LLM	fe9192f120	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-09 03:16:42 +00:00
Yanchao Lu	b464c75056	[None][ci] Waive test failures on main 02/08 (#11365 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-08 22:50:37 +08:00
TensorRT LLM	f7cf25748b	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-08 03:10:28 +00:00
mpikulski	03b38e9fbf	[TRTLLM-10030][perf] avoid sync in PyTorchModelEngine when using beam search (#11341 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-07 12:31:11 +08:00
William Zhang	ffc0f54959	[https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 ) * Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-06 22:37:42 -05:00
TensorRT LLM	408d610877	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-07 03:13:33 +00:00
Iman Tabrizian	18e611da77	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-06 14:23:51 -05:00
Gal Hubara-Agam	f9eed3ecc2	[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds (#11107 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-02-06 14:55:18 +02:00
Shi Xiaowei	b1268e1b37	[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-02-06 07:15:18 -05:00
Bo Li	66caa67357	[None][doc] Add sparse attention docs to index. (#11342 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-02-06 17:53:41 +08:00
Yueh-Ting (eop) Chen	383c5921c2	[https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation (#10798 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2026-02-06 14:28:47 +08:00
Emma Qiao	09807918c7	[None][infra] Waive failed case and delete the redundent waives (#11331 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-06 13:56:51 +08:00
Zongfei Jing	df1c1a23d4	[https://nvbugs/5722629 ] [fix] Remove waive for nvbug 5722629 (#11278 ) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 00:51:30 -05:00
Chenghao Zhang	9644f024bd	[None][feat] AutoDeploy: add triton backend for causal conv (#11124 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-02-05 21:33:00 -08:00
Chenghao Zhang	d160439ef9	[#11148 ][feat] AutoDeploy: Better structure the custom op (#11152 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-02-05 21:32:22 -08:00
Bo Li	639051e98b	[TRTLLM-10021][docs] Skip Softmax Attention blog and docs. (#10592 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-02-06 12:11:21 +08:00
TensorRT LLM	2e6d9350fa	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-06 03:26:42 +00:00
Yan Chunwei	b98f3fca20	[https://nvbugs/5744432 ][fix] fix bench script test (#10483 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2026-02-06 11:02:24 +08:00
Simeng Liu	86e867297e	[https://nvbugs/5856637 ][ci] Remove the skip for fixed tests. (#11285 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-05 21:45:00 -05:00
yifeizhang-c	5521c7b7e7	[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130 ) Added FP8 cute dsl gemm and batch gemm. Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2026-02-06 09:49:30 +08:00
Lucas Liebenwein	712dcd31a9	[https://nvbugs/5859869 ][fix] remove test waive since test is already deprecated (#11288 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-05 20:42:43 -05:00
Chuang Zhu	a9d4927235	[TRTLLM-10752][chore] set default val of max_num_tokens_in_buffer as max_seq_len or max_input_len (#11082 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-02-05 14:54:00 -05:00
Harris Nover	a7494a5ff4	[None][chore] Remove outdated comment in model_engine.py (#11240 ) Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>	2026-02-05 13:54:46 -05:00
jthomson04	d778b26062	[None][fix] Reduce host memory usage during model loading (#11119 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-02-05 08:57:40 -08:00
nvyocox	e52eb82780	[#11234 ][test] Move test_ad_export_onnx to integration examples (#11260 ) Signed-off-by: yocox <yocox@nvidia.com>	2026-02-05 11:32:57 -05:00
Yuxian Qiu	d3d951d837	[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. (#11256 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-06 00:28:29 +08:00
mpikulski	7d235cfb23	[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes (#11281 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-05 16:33:22 +01:00
chenfeiz0326	eae480b713	[https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope (#11259 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-02-05 23:12:38 +08:00
mpikulski	719e82c429	[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) (#11276 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-05 15:33:51 +01:00
Jiayu Chang	e483c7263d	[None][docs] Add CUDA Graph + LoRA in Feature Combination Matrix (#11187 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-02-05 15:01:59 +01:00
Yuewei Na	0d18b2d7a4	[None][feat] Add priority-based KV cache offload filtering support (#10751 ) Signed-off-by: Yuewei Na <yna@nvidia.com> Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>	2026-02-05 05:22:56 -05:00
Chang Su	9601b17459	[#11037 ][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests (#11292 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-02-05 05:00:29 -05:00
Yao Yao	d9b936be94	[None][feat] Enhance support for complex models (#11254 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2026-02-05 17:28:26 +08:00
xxi	4c1d9d0c10	[None][chore] Pass without_comm to cutlass and deepgemm (#11229 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-05 02:07:59 -05:00
Yechan Kim	36cb5f8c93	[https://nvbugs/5747920 ][fix] Fix multimodal serve test (#11296 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-02-05 15:12:53 +09:00
xinhe-nv	8447a96c29	[None][chore] Add failed cases into waives.txt (#11223 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-02-05 00:27:24 -05:00
dongfengy	ada4a3a28e	[https://nvbugs/5800679 ][fix] Re-enable test after bug fixed (#11249 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-02-04 21:08:27 -08:00
Jin Li	9091a193a8	[https://nvbugs/5837275 ][fix] Unwaive the failing case that cannot be… (#11137 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-02-05 12:52:10 +08:00
Yi Zhang	ada463d15d	[None][fix] Fix comments for kv cache manager v2 (#11207 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2026-02-04 23:31:29 -05:00
TensorRT LLM	4adf76d860	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-05 03:42:54 +00:00
dongfengy	0bd4630cd1	[https://nvbugs/5854860 ][fix] Fix cutedsl argmax on sm120 (#11181 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-02-04 17:15:31 -05:00
dongfengy	ad2d1df4a9	[https://nvbugs/5849697 ][fix] Refine QA Test List for SM120 (#11248 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-02-04 11:59:04 -08:00
Simeng Liu	d9fd8cc951	[https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-04 12:46:31 -05:00

1 2 3 4 5 ...

5021 Commits