TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

Author	SHA1	Message	Date
Bo Deng	be88fe33be	[None][fix] fix tinygemm accuracy (#11411 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2026-02-10 05:09:30 -05:00
Jonas Li	8b2dc57823	[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch (#11384 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2026-02-10 14:00:42 +08:00
Yuxian Qiu	af68c29d3d	[None][chore] Reduce attention module repeated warnings. (#11335 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 08:58:21 +08:00
Iman Tabrizian	18e611da77	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-06 14:23:51 -05:00
yifeizhang-c	5521c7b7e7	[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130 ) Added FP8 cute dsl gemm and batch gemm. Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2026-02-06 09:49:30 +08:00
Yuxian Qiu	d3d951d837	[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. (#11256 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-06 00:28:29 +08:00
Yuewei Na	0d18b2d7a4	[None][feat] Add priority-based KV cache offload filtering support (#10751 ) Signed-off-by: Yuewei Na <yna@nvidia.com> Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>	2026-02-05 05:22:56 -05:00
Simeng Liu	d9fd8cc951	[https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-04 12:46:31 -05:00
Yukun He	de465efc5f	[https://nvbugs/5814309 ][fix] Use NCCL as fallback to avoid crash due to insufficient memory (#10928 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Zhenhuan Chen	6c2ecad2fe	[https://nvbugs/5769425 ][fix] add syncthreads for tinygemm to resolve intermittent accuracy problem (#10873 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
HuiGao-NV	8fd22ac72d	[https://nvbugs/5740377 ][fix] Prevent out-of-bounds read (#10868 ) Signed-off-by: Hui Gao <huig@nvidia.com> Co-authored-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Kaiyu Xie	9909dca6fa	[None] [feat] Add PDL support for moeAlltoAllKernels (#10591 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-02-02 13:23:37 +08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Yibin Li	322471cdd7	[https://nvbugs/5825514 ][fix] Add null pointer check to parseNpyHeader (#10944 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> This PR addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com	2026-01-30 03:01:33 -05:00
Jin Li	ef268e2062	[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-01-30 01:49:17 -05:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Yi Sun	f6dab8388d	[https://nvbugs/5813452 ][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922 ) Signed-off-by: Yi Sun <yisun0618@gmail.com>	2026-01-29 14:38:11 +08:00
Ludwig Schneider	4e10bf8950	[None][fix] nccl symmetric with graceful fallbacks (#11042 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2026-01-28 15:43:24 -08:00
Linda	29647d9446	[None][chore] Removing cpp/tensorrt_llm/pybind (#11026 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-28 11:25:11 +01:00
Yuan Tong	30348b2753	[None][fix] Proper conditional compilation of sm10x cubins (#10839 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2026-01-28 10:17:51 +08:00
NVShreyas	6c1862fb33	[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver (#10957 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2026-01-27 12:23:02 -08:00
Chuang Zhu	d6f76d2fae	[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-27 16:34:17 +08:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Linda	ce556290c9	[None][chore] Removing pybind11 bindings and references (#10550 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-26 08:19:12 -05:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
Patrice Castonguay	d548b29a41	[None][fix] Bugfix/mtp with async scheduler (#10941 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: rongwei <scutizhang@tencent.com>	2026-01-24 07:19:54 -05:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
Yi Zhang	d43be7b65e	[None][fix] Avoid Double update for previous batch (#9888 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2026-01-22 13:15:06 -05:00
Shi Xiaowei	944c304bbb	[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:14:50 -08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
Lizhi Zhou	f3a41c8d94	[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-21 22:52:34 -05:00
Yukun He	bf7303c7f1	[https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 17:25:40 +08:00
HuiGao-NV	1592dfab6d	[https://nvbugs/5740377 ][fix] Lock resource to fix potential access to released data (#10827 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-21 14:17:29 +08:00
Daniel Stokes	2f3b2a3172	[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes (#9532 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-21 10:14:39 +08:00
Zheng Duan	26c23cf99f	[https://nvbugs/5760737 ][test] only skip mooncake+indexerkcache test (#10266 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2026-01-21 09:48:39 +08:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Yi Zhang	58311b2345	[None][fix] Remove unused params in attn (#10652 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-01-20 03:08:59 -05:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Bo Li	f3a985ce27	[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. (#10539 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-20 11:08:04 +08:00
Liao Lanyu	dbb858ae0c	[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-01-20 10:31:13 +08:00
Tian Zheng	cfebfbb505	[https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-16 18:59:54 +08:00
Yukun He	f001c4946d	[https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP (#10633 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-16 13:03:36 +08:00
Enwei Zhu	7b8b9ccbaf	[https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits (#10698 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-16 11:04:26 +08:00
Thor Johnsen	0998a7bf20	[https://nvbugs/5721661 ][fix] Prevent out-of-bounds read (#9879 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>	2026-01-15 10:51:40 -06:00
Lizhi Zhou	93db0d5e18	[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 19:18:21 +08:00
Pengbo Wang	683515b1bd	[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window (#10335 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-15 15:47:00 +08:00
Perkz Zheng	71ccc07d2b	[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-15 02:24:25 -05:00

1 2 3 4 5 ...

861 Commits