shuyixiong
278ced972b
[TRTLLM-9771][feat] Allow overriding quantization configs ( #11062 )
...
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-31 10:48:51 -05:00
dongfengy
4f0c1b2489
[TRTLLM-10733][feat] Make TRTLLM MOE the default one for GPTOSS on Blackwell ( #11074 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-29 23:59:19 -08:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks ( #11018 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Yan Chunwei
54768f3f2c
[None][chore] refine placement group in ray executor ( #10235 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-01-23 19:31:20 +08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements ( #10124 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yan Chunwei
30ffa58b54
[ https://nvbugs/5783876 ][fix] fix hmac launch ( #10434 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
shuyixiong
fd2af8d58a
[TRTLLM-9771][feat] Support partial update weight for fp8 ( #10456 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-22 14:46:05 +08:00
Taylor Yeonbok Lee
895bb94b3d
[ #8241 ][feat] Support model_kwargs for pytorch backend ( #10351 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-21 20:51:38 -08:00
Yechan Kim
70caa779a4
[None][feat] K-EXAONE MTP support ( #10796 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-22 13:43:00 +09:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id ( #10187 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yibin Li
9116dfbacd
[ https://nvbugs/5775021 ] [fix] Replace pickle.load with restricted Unpickler ( #10622 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2026-01-21 11:42:54 +08:00
Stefan Niebler
0cfd08745c
[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler ( #9675 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2026-01-16 10:52:41 -08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias ( #10099 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Tzu-Ling Kan
c99faaed06
[ #9760 ][fix] Use RequestError for validation errors to prevent engine shutdown ( #9761 )
...
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
2026-01-14 10:22:36 -05:00
HuiGao-NV
b10704428d
[ https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data ( #10569 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-14 07:53:01 -05:00
mpikulski
e0331297a6
[TRTLLM-9522][fix] broken cast ( #9975 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-08 06:47:39 -05:00
Fanrong Li
b5a1e10bc0
[ https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata ( #10393 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 09:43:44 +08:00
William Zhang
a6a88985cf
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg ( #9758 )
...
* Why?
Certain VLMs like the Qwen family need more than just the multimodal
embeddings in the language model, and need MRoPE position IDs and
deltas. Prior to this commit, only the embeddings could be communicated
from the encoder worker to the prefill worker.
* What?
This commit extends the `DisaggregatedParams` to include the MRoPE
information. It also adjusts several pieces of code required to
communicate that between E, P and D workers.
Closes TRTLLM-9409.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-22 06:32:49 -05:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 ( #9971 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Chang Liu
31bc14b350
[TRTLLM-9654][feat] Support DeepSeek-V32 chat template ( #9814 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 17:05:38 +08:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend ( #8309 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 ( #9944 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. ( #9524 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 ( #9885 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism ( #9720 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
Venky
fd1270b9ab
[TRTC-43] [feat] Add config db and docs ( #9420 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-12-12 04:00:03 +08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ ( #9353 )
...
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
dhansen-nvidia
2d33ae94d5
[ https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… ( #8463 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-12-09 18:51:31 -05:00
Jonas Li
2645a78f34
[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature ( #9682 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-06 02:24:51 -08:00
mpikulski
8d2178d321
[TRTLLM-9522][chore] implement default attach_multimodal_embeddings ( #9664 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-05 22:12:16 -08:00
Lizhi Zhou
0d0a16fff4
[TRTLLM-8920][feat] decouple disagg service from fastapi ( #8714 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:44:16 +08:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Yan Chunwei
b86256eb54
[TRTLLM-9144][fix] enhance RPC robustness ( #8711 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-12-02 21:37:59 +08:00
mpikulski
84a1531594
[TRTLLM-9488][feat] use FlashInfer.sampling by default ( #9545 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-02 16:29:55 +08:00
Yuening Li
09c840184c
[None][fix] Prevent YAML partial kv_cache_config from incorrectly overriding the complete kv_cache_config ( #9262 )
...
Signed-off-by: Yuening Li <62227368+Yuening-wa@users.noreply.github.com>
2025-12-02 10:10:08 +08:00
Venky
639c939a4f
[TRTC-1943][feat] Env vars override support in LLM API ( #9104 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-01 10:04:49 -08:00
heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV ( #9333 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
mpikulski
e5f39ec7cf
[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option ( #9454 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 13:00:39 +01:00
xxi
f1ed057b4c
[cherry-pick][ https://nvbugs/5670793 ][fix] Solve trtllm-serve launch_disaggregated issue ( #9346 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-11-27 16:13:58 +08:00
QI JUN
a67d94963e
[None][chore] update comments in llm_args.py ( #9472 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 11:06:34 +08:00
shuyixiong
d8acea1db3
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights ( #9224 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-26 10:59:06 +08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
QI JUN
34a6d2d28f
[TRTLLM-9302][chore] Move build config from BaseLlmArgs to TrtLlmArgs ( #9249 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 10:54:41 +08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
Yan Chunwei
b5f9fff1c1
[ https://nvbugs/5569754 ][fix] trtllm-llmapi-launch port conflict ( #8582 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
5d118e0326
[None][chore] Revise the description of enable_autotuner. ( #9320 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-19 22:59:37 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00