Commit Graph

1172 Commits

Author SHA1 Message Date
Enwei Zhu
5ff244ce54
[https://nvbugs/5837281][fix] Fix trtllm-serve guided decoding test (#11101)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-30 16:59:55 +08:00
Chang Su
dbad94715b
[None][feat] Add gRPC server for high-performance external router integration (#11037)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-01-30 07:48:27 +08:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Lucas Liebenwein
a4880ffdbb
[None][fix] AutoDeploy: remove mem check for a log unit test (#11120)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-29 15:41:51 -05:00
Stefan Niebler
7d31532850
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2026-01-29 11:06:09 -05:00
WeiHaocheng
80dd6e70c6
[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2026-01-29 20:43:34 +08:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Anish Shanbhag
24ac86c485
[https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
Bala Marimuthu
393c3d259e
[#10245][feat] AutoDeploy: Add Minimax M2 support (#10525)
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-01-28 17:22:32 -05:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00
Grzegorz Kwasniewski
38bcee189c
[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-28 10:34:10 +01:00
Lucas Liebenwein
ff3a494f5c
[#10013][feat] AutoDeploy: native cache manager integration (#10635)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Yukun He
b575184fca
[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-27 16:39:40 +08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
Bo Li
6b251cc7fa
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-27 15:55:07 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Lucas Liebenwein
00f341be49
[#8982][feat] AutoDeploy attention dp support (#10728)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Pengyun Lin
ce37e27066
[#10614][fix] gpt_oss first iteration streaming in trtllm-serve (#10808)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-26 20:53:11 +08:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Enwei Zhu
ffab217974
[None][fix] Fix CuteDSL MoE unittest (#10983)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-26 08:34:17 +08:00
Enwei Zhu
72ef732bcf
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-25 21:02:30 +08:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
William Zhang
2146c23786
[#9306][refactor] Refactor AutoDeployConfig into LlmArgs (#10613)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski
d8e6e22060
[https://nvbugs/5819002][fix] fix sharding tests (#10775)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-22 20:02:48 +01:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yan Chunwei
30ffa58b54
[https://nvbugs/5783876][fix] fix hmac launch (#10434)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
Pengyun Lin
5e34112b27
[TRTLLM-10388][feat] Support logprobs for Completions API (#10809)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-22 21:25:24 +08:00
Jiayu Chang
1dc49b266e
[https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
tcherckez-nvidia
128d4ac5be
[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803)
Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
2026-01-22 13:08:05 +02:00
shuyixiong
fd2af8d58a
[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-22 14:46:05 +08:00
Enwei Zhu
be4a431ffd
[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee
895bb94b3d
[#8241][feat] Support model_kwargs for pytorch backend (#10351)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-21 20:51:38 -08:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yukun He
bf7303c7f1
[https://nvbugs/5636916][fix] Cherry-pick #10654: Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-21 17:25:40 +08:00
Yibin Li
9116dfbacd
[https://nvbugs/5775021] [fix] Replace pickle.load with restricted Unpickler (#10622)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2026-01-21 11:42:54 +08:00
jthomson04
2db3d7eeba
[None][chore] Async Transfer Manager (#9891)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-01-20 12:12:47 -05:00
Yi Zhang
58311b2345
[None][fix] Remove unused params in attn (#10652)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-01-20 03:08:59 -05:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion (#10656) 2026-01-20 12:31:17 +08:00
Liao Lanyu
dbb858ae0c
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-01-20 10:31:13 +08:00
Zhanrui Sun
df845a028b
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-19 00:40:40 -05:00
Lucas Liebenwein
9879400479
[#10642][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] (#10675)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-18 13:42:30 -05:00
Eran Geva
4d2916d683
[#10688][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size (#10687)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-18 13:31:01 -05:00
Eran Geva
a11f0dbd61
[#10696][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 (#10697)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-18 10:42:49 +02:00
Grzegorz Kwasniewski
7bf4dd9f63
[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>
2026-01-17 04:02:06 -05:00
Yukun He
3d16daf696
[None][fix] Fix tmp dir being deleted too early in unit test. (#10740)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-17 13:49:10 +08:00
Frida Hou
069ad68d3c
[None][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-16 16:24:37 -05:00
Chenghao Zhang
b6acd96616
[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 12:04:40 -08:00
Stefan Niebler
0cfd08745c
[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2026-01-16 10:52:41 -08:00