Yi Zhang
|
0306c0f12c
|
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
|
2026-02-02 14:29:02 +08:00 |
|
Liao Lanyu
|
fef0e4b17d
|
[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2026-02-02 10:36:08 +08:00 |
|
shuyixiong
|
278ced972b
|
[TRTLLM-9771][feat] Allow overriding quantization configs (#11062)
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
|
2026-01-31 10:48:51 -05:00 |
|
Frida Hou
|
7910d4d2a9
|
[#8242][feat] Add int4 GPTQ support for AutoDeploy (#8248)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2026-01-30 23:07:24 -08:00 |
|
Guoming Zhang
|
6bace84167
|
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2026-01-31 13:48:25 +08:00 |
|
Balaram Buddharaju
|
531f85dc9b
|
[None][feat] Perfect routing for Deepseek models (#11127)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-30 23:46:35 -05:00 |
|
Karthik
|
5a97374f3c
|
[#9525][feat] add L2 norm pattern matcher and fusion transform (#10767)
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
|
2026-01-30 16:05:53 -05:00 |
|
nvyocox
|
4af47208d8
|
[None][feat] Export ONNX for DriveOS LLM (#10117)
Signed-off-by: yocox <yocox@nvidia.com>
|
2026-01-30 15:43:11 -05:00 |
|
Liao Lanyu
|
f2dd0ee128
|
[None][chore] Correct sorting order for attention DP scheduling to prioritize non-relaxed requests (#11106)
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
|
2026-01-30 16:06:48 +08:00 |
|
dongfengy
|
4f0c1b2489
|
[TRTLLM-10733][feat] Make TRTLLM MOE the default one for GPTOSS on Blackwell (#11074)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2026-01-29 23:59:19 -08:00 |
|
Jin Li
|
ef268e2062
|
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2026-01-30 01:49:17 -05:00 |
|
Necofish
|
144b61715f
|
[None][fix] Add missing absolute pe in Qwen3-VL Vision Encoder (#11065)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
|
2026-01-30 09:59:36 +09:00 |
|
Chenghao Zhang
|
e033929221
|
[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
|
2026-01-29 14:59:29 -08:00 |
|
Harris Nover
|
ab7dd34bbe
|
[None][chore] Consolidate duplicate kv cache reuse variables. (#10935)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
|
2026-01-29 11:03:27 -08:00 |
|
Stefan Niebler
|
7d31532850
|
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2026-01-29 11:06:09 -05:00 |
|
Balaram Buddharaju
|
c7a86f89de
|
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-29 02:57:13 -05:00 |
|
Tailing Yuan
|
91528365a9
|
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-29 14:01:51 +08:00 |
|
Enwei Zhu
|
34a730aaf7
|
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2026-01-29 12:11:07 +08:00 |
|
Anish Shanbhag
|
24ac86c485
|
[https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2026-01-28 19:56:32 -08:00 |
|
Frida Hou
|
f03908cf9e
|
[None][fix] fix Qwen2/3 export for AutoDeploy (#11007)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2026-01-28 16:53:21 -08:00 |
|
Ludwig Schneider
|
4e10bf8950
|
[None][fix] nccl symmetric with graceful fallbacks (#11042)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
|
2026-01-28 15:43:24 -08:00 |
|
Bala Marimuthu
|
393c3d259e
|
[#10245][feat] AutoDeploy: Add Minimax M2 support (#10525)
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
|
2026-01-28 17:22:32 -05:00 |
|
gramnarayan
|
744a955cbb
|
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
|
2026-01-28 12:10:49 -08:00 |
|
Lucas Liebenwein
|
ff3a494f5c
|
[#10013][feat] AutoDeploy: native cache manager integration (#10635)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-01-27 11:23:22 -05:00 |
|
Yukun He
|
b575184fca
|
[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2026-01-27 16:39:40 +08:00 |
|
Chuang Zhu
|
d6f76d2fae
|
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2026-01-27 16:34:17 +08:00 |
|
ZhichenJiang
|
fae4985797
|
[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987)
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
|
2026-01-27 16:15:32 +08:00 |
|
Bo Li
|
6b251cc7fa
|
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2026-01-27 15:55:07 +08:00 |
|
Tailing Yuan
|
5553391c5e
|
[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-27 13:18:34 +08:00 |
|
Wanli Jiang
|
4a206351bb
|
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-27 13:04:40 +08:00 |
|
ameynaik-hub
|
df8be0c50c
|
[TRTLLM-10276][feat] Integrate cutedsl argmax kernel (#10476)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
|
2026-01-26 22:08:47 -05:00 |
|
sunnyqgg
|
ff0dd6076e
|
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
|
2026-01-26 11:23:26 -05:00 |
|
Lucas Liebenwein
|
00f341be49
|
[#8982][feat] AutoDeploy attention dp support (#10728)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-01-26 09:43:33 -05:00 |
|
Bo Li
|
e405468230
|
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2026-01-26 17:59:03 +08:00 |
|
Tian Zheng
|
5efee01da1
|
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2026-01-26 16:46:33 +08:00 |
|
Enwei Zhu
|
72ef732bcf
|
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2026-01-25 21:02:30 +08:00 |
|
Yanchao Lu
|
ae58a7ed20
|
[None][chore] Revert NVIDIA/TensorRT-LLM#10819 (#10870)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Yanchao Lu
|
18f63dfcec
|
[None][chore] Reduce tedious logs (#10819)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
mpikulski
|
0f7ec033f7
|
[https://nvbugs/5791242][fix] workaround for flashinfer.sampling.sampling_from_logits (#10713)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Yukun He
|
25bdc30162
|
[https://nvbugs/5782112][fix] Cherry-pick #10633: Fix hanging issue for MNNVL Allreduce under PP (#10750)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Yuxian Qiu
|
2b3bb2e9b0
|
[https://nvbugs/5811697][fix] Fix buffer reuse. (#10716)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Mike Iovine
|
f02948d956
|
[https://nvbugs/5803813][fix] Fix llama 4 min latency (#10724)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Yuxian Qiu
|
9fcc93ea7b
|
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-24 14:04:10 +08:00 |
|
Kaiyu Xie
|
da967d0bd7
|
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2026-01-23 22:29:37 -05:00 |
|
jthomson04
|
cf88da7eca
|
[None][feat] KV Connector Support for MTP (#10932)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2026-01-23 18:58:26 -05:00 |
|
Taylor Yeonbok Lee
|
1fbbb1f3cd
|
[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
|
2026-01-23 15:22:54 -08:00 |
|
Leslie Fang
|
31d04dfa12
|
[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2026-01-23 10:16:58 +08:00 |
|
William Zhang
|
2146c23786
|
[#9306][refactor] Refactor AutoDeployConfig into LlmArgs (#10613)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2026-01-22 16:02:49 -05:00 |
|
Grzegorz Kwasniewski
|
d8e6e22060
|
[https://nvbugs/5819002][fix] fix sharding tests (#10775)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2026-01-22 20:02:48 +01:00 |
|
Yi Zhang
|
d43be7b65e
|
[None][fix] Avoid Double update for previous batch (#9888)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2026-01-22 13:15:06 -05:00 |
|