Chang Su
dbad94715b
[None][feat] Add gRPC server for high-performance external router integration ( #11037 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-01-30 07:48:27 +08:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup ( #10867 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Harris Nover
ab7dd34bbe
[None][chore] Consolidate duplicate kv cache reuse variables. ( #10935 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-01-29 11:03:27 -08:00
Stefan Niebler
7d31532850
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler ( #10459 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2026-01-29 11:06:09 -05:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP ( #10477 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks ( #11018 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Enwei Zhu
34a730aaf7
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE ( #11016 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-29 12:11:07 +08:00
Anish Shanbhag
24ac86c485
[ https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency ( #10471 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
Frida Hou
f03908cf9e
[None][fix] fix Qwen2/3 export for AutoDeploy ( #11007 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-28 16:53:21 -08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks ( #11042 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Bala Marimuthu
393c3d259e
[ #10245 ][feat] AutoDeploy: Add Minimax M2 support ( #10525 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-01-28 17:22:32 -05:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint ( #10674 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00
Evgueni Petrov
f25a2c53bb
[ #10877 ][fix] restore ipv6 support in serve.py ( #10929 )
...
Signed-off-by: Evgueni Petrov <evgueni.s.petrov@gmail.com>
2026-01-27 11:55:59 -08:00
Lucas Liebenwein
ff3a494f5c
[ #10013 ][feat] AutoDeploy: native cache manager integration ( #10635 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Yukun He
b575184fca
[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning ( #10956 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-27 16:39:40 +08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) ( #10495 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
ZhichenJiang
fae4985797
[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth ( #10987 )
...
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
2026-01-27 16:15:32 +08:00
Bo Li
6b251cc7fa
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. ( #11002 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-27 15:55:07 +08:00
Lizhi Zhou
93ae8a14ab
[ #10889 ][fix] fix pydantic deepcopy bug ( #11004 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
Yiqing Yan
ea5d811aec
[None][chore] Bump version to 1.3.0rc2 ( #11021 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-27 15:26:03 +08:00
Tailing Yuan
5553391c5e
[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler ( #10943 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-27 13:18:34 +08:00
Wanli Jiang
4a206351bb
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer ( #10757 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-27 13:04:40 +08:00
ameynaik-hub
df8be0c50c
[TRTLLM-10276][feat] Integrate cutedsl argmax kernel ( #10476 )
...
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-26 22:08:47 -05:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super ( #10754 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Lucas Liebenwein
00f341be49
[ #8982 ][feat] AutoDeploy attention dp support ( #10728 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Pengyun Lin
ce37e27066
[ #10614 ][fix] gpt_oss first iteration streaming in trtllm-serve ( #10808 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-26 20:53:11 +08:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. ( #10885 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV ( #10813 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Enwei Zhu
72ef732bcf
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark ( #10279 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-25 21:02:30 +08:00
Yanchao Lu
ae58a7ed20
[None][chore] Revert NVIDIA/TensorRT-LLM#10819 ( #10870 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yanchao Lu
18f63dfcec
[None][chore] Reduce tedious logs ( #10819 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
mpikulski
0f7ec033f7
[ https://nvbugs/5791242 ][fix] workaround for flashinfer.sampling.sampling_from_logits ( #10713 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yukun He
25bdc30162
[ https://nvbugs/5782112 ][fix] Cherry-pick #10633 : Fix hanging issue for MNNVL Allreduce under PP ( #10750 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yuxian Qiu
2b3bb2e9b0
[ https://nvbugs/5811697 ][fix] Fix buffer reuse. ( #10716 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Mike Iovine
f02948d956
[ https://nvbugs/5803813 ][fix] Fix llama 4 min latency ( #10724 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 ( #10736 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Yuxian Qiu
9fcc93ea7b
[ https://nvbugs/5829097 ][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. ( #10918 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-24 14:04:10 +08:00
Kaiyu Xie
da967d0bd7
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances ( #10755 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-23 22:29:37 -05:00
jthomson04
cf88da7eca
[None][feat] KV Connector Support for MTP ( #10932 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-01-23 18:58:26 -05:00
Taylor Yeonbok Lee
1fbbb1f3cd
[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform ( #10772 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-23 15:22:54 -08:00
Yan Chunwei
54768f3f2c
[None][chore] refine placement group in ray executor ( #10235 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-01-23 19:31:20 +08:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu ( #10699 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
William Zhang
2146c23786
[ #9306 ][refactor] Refactor AutoDeployConfig into LlmArgs ( #10613 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski
d8e6e22060
[ https://nvbugs/5819002 ][fix] fix sharding tests ( #10775 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-22 20:02:48 +01:00
Yi Zhang
d43be7b65e
[None][fix] Avoid Double update for previous batch ( #9888 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-01-22 13:15:06 -05:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) ( #10494 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements ( #10124 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yan Chunwei
30ffa58b54
[ https://nvbugs/5783876 ][fix] fix hmac launch ( #10434 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
Pengyun Lin
5e34112b27
[TRTLLM-10388][feat] Support logprobs for Completions API ( #10809 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-22 21:25:24 +08:00
彭晋韬(jtao peng)
9beb971827
[None][fix] Update RMSNorm custom op plumbing ( #10843 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-22 21:03:22 +08:00