Liao Lanyu
|
f2dd0ee128
|
[None][chore] Correct sorting order for attention DP scheduling to prioritize non-relaxed requests (#11106)
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
|
2026-01-30 16:06:48 +08:00 |
|
Jin Li
|
ef268e2062
|
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2026-01-30 01:49:17 -05:00 |
|
Harris Nover
|
ab7dd34bbe
|
[None][chore] Consolidate duplicate kv cache reuse variables. (#10935)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
|
2026-01-29 11:03:27 -08:00 |
|
Stefan Niebler
|
7d31532850
|
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2026-01-29 11:06:09 -05:00 |
|
Balaram Buddharaju
|
c7a86f89de
|
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-29 02:57:13 -05:00 |
|
Tailing Yuan
|
91528365a9
|
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-29 14:01:51 +08:00 |
|
Lucas Liebenwein
|
ff3a494f5c
|
[#10013][feat] AutoDeploy: native cache manager integration (#10635)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-01-27 11:23:22 -05:00 |
|
Chuang Zhu
|
d6f76d2fae
|
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2026-01-27 16:34:17 +08:00 |
|
Tailing Yuan
|
5553391c5e
|
[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-27 13:18:34 +08:00 |
|
sunnyqgg
|
ff0dd6076e
|
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
|
2026-01-26 11:23:26 -05:00 |
|
mpikulski
|
0f7ec033f7
|
[https://nvbugs/5791242][fix] workaround for flashinfer.sampling.sampling_from_logits (#10713)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2026-01-25 18:12:21 +08:00 |
|
Yuxian Qiu
|
9fcc93ea7b
|
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-24 14:04:10 +08:00 |
|
Kaiyu Xie
|
da967d0bd7
|
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2026-01-23 22:29:37 -05:00 |
|
jthomson04
|
cf88da7eca
|
[None][feat] KV Connector Support for MTP (#10932)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2026-01-23 18:58:26 -05:00 |
|
Yi Zhang
|
d43be7b65e
|
[None][fix] Avoid Double update for previous batch (#9888)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2026-01-22 13:15:06 -05:00 |
|
Jiayu Chang
|
1dc49b266e
|
[https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
|
2026-01-22 14:01:18 +01:00 |
|
Pengbo Wang
|
9462d90ec7
|
[None][feat] Add KV cache cleanup (#7439)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2026-01-22 15:14:17 +08:00 |
|
Taylor Yeonbok Lee
|
895bb94b3d
|
[#8241][feat] Support model_kwargs for pytorch backend (#10351)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
|
2026-01-21 20:51:38 -08:00 |
|
Yechan Kim
|
70caa779a4
|
[None][feat] K-EXAONE MTP support (#10796)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2026-01-22 13:43:00 +09:00 |
|
Lizhi Zhou
|
f3a41c8d94
|
[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2026-01-21 22:52:34 -05:00 |
|
dongxuy04
|
635cbf01ba
|
[https://nvbugs/5816267][fix] Remove weight tensor holder to release memory earlier (#10876)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2026-01-21 16:42:52 -08:00 |
|
Simeng Liu
|
3c8ed19440
|
[https://nvbugs/5670108][fix] Fix overlap scheduler race condition in… (#10610)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
|
2026-01-20 10:56:56 -08:00 |
|
jthomson04
|
2db3d7eeba
|
[None][chore] Async Transfer Manager (#9891)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
|
2026-01-20 12:12:47 -05:00 |
|
Yi Zhang
|
58311b2345
|
[None][fix] Remove unused params in attn (#10652)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
|
2026-01-20 03:08:59 -05:00 |
|
Liao Lanyu
|
dbb858ae0c
|
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2026-01-20 10:31:13 +08:00 |
|
Stefan Niebler
|
0cfd08745c
|
[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2026-01-16 10:52:41 -08:00 |
|
Wanli Jiang
|
722978b837
|
[TRTLLM-10305][feat] Support customized seq len larger than model config (#10600)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-16 16:07:36 +08:00 |
|
Enwei Zhu
|
7b8b9ccbaf
|
[https://nvbugs/5669671][fix] Support GuidedDecoder with sharded logits (#10698)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2026-01-16 11:04:26 +08:00 |
|
Lizhi Zhou
|
93db0d5e18
|
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2026-01-15 19:18:21 +08:00 |
|
Lizhi Zhou
|
ff277b591e
|
[https://nvbugs/5791830][fix] fix pp loop hang caused by i-sending new requests (#10665)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2026-01-15 16:33:55 +08:00 |
|
Anish Shanbhag
|
faa80e73fd
|
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2026-01-14 21:06:07 -08:00 |
|
HuiGao-NV
|
b10704428d
|
[https://nvbugs/5787566][fix] Only keep a limited number of performance statistic data (#10569)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2026-01-14 07:53:01 -05:00 |
|
Yuxian Qiu
|
39cefd6125
|
[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-14 14:05:47 +08:00 |
|
Guoming Zhang
|
bdaee87895
|
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2026-01-13 17:13:55 +08:00 |
|
Yuxian Qiu
|
04b112651b
|
[None][feat] Hang detection for executor loop and worker. (#10480)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-13 02:34:32 -05:00 |
|
Iman Tabrizian
|
48b09e5a25
|
[https://nvbugs/5689235][fix] Fix cancellation+chunked prefill+disagg (#10111)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2026-01-12 18:23:26 -05:00 |
|
Yuxian Qiu
|
80f261ea36
|
[https://nvbugs/5622938][feat] Run sample_async on extra stream. (#10215)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-09 18:15:18 +08:00 |
|
JadoTu
|
4c498bfe58
|
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873)
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
|
2026-01-09 14:50:16 +08:00 |
|
Mike Iovine
|
db2614ef10
|
[https://nvbugs/5772414][fix] Fix draft token tree depth=1 corner case (#10385)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2026-01-05 17:20:14 +01:00 |
|
Mike Iovine
|
bedfff4f00
|
[https://nvbugs/5772521][fix] Fix draft token tree chain crash (#10386)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2026-01-05 17:18:44 +01:00 |
|
HuiGao-NV
|
2f768b76f8
|
[https://nvbugs/5715568][fix] Force release torch memory when LLM is destroyed (#10314)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2026-01-05 15:30:18 +08:00 |
|
Faraz
|
8e2065b4d9
|
[https://nvbugs/5670469][fix] Filter 0s and choose min of kv_head for Nemotron model (#10206)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
|
2026-01-05 08:42:53 +08:00 |
|
Jaedeok Kim
|
a4dcc6a711
|
[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager (#10330)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
|
2026-01-04 06:07:30 -05:00 |
|
Izzy Putterman
|
bdf6953ddc
|
[None][feat] Eagle: MLA Based Eagle (#9677)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-01-02 13:45:07 -05:00 |
|
Balaram Buddharaju
|
4a1b742aa0
|
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-01 13:42:53 -05:00 |
|
Simeng Liu
|
84d107b2f0
|
[https://nvbugs/5717993][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. (#10060)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
|
2025-12-31 09:22:54 -08:00 |
|
Jin Li
|
34c2fd50a9
|
[https://nvbugs/5707359][fix] Unwaive OOM case that should be fixed by #9446 (#10334)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-31 10:41:39 +08:00 |
|
Yuxian Qiu
|
1f3afb8e6f
|
[None][feat] Implement send_object for TorchDist. (#10213)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-12-31 10:40:52 +08:00 |
|
Ziyi Xiong
|
c59aa8bec5
|
[TRTLLM-9962][feat] Some optimizations for two-model spec dec (#10208)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-12-28 12:52:04 +08:00 |
|
Olya Kozlova
|
55f3cda66d
|
[None][fix] Fix request_id for best_of/n case (#8368)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2025-12-26 22:20:24 +01:00 |
|