William Zhang
c53d1814a7
[None][feat] Extend VLM factory and add Mistral3 factory ( #7583 )
...
This commit:
* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
sampling, but it is not included in the stop words' token IDs).
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-09 02:47:18 -04:00
Guoming Zhang
f53fb4c803
[TRTLLM-5930][doc] 1.0 Documentation. ( #6696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
zhanghaotong
96af324ff1
[None][fix] Add try-catch in stream generator ( #7467 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
2025-09-08 16:09:26 -04:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp ( #7117 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Leslie Fang
3e0073e86b
[None][chore] remove executor config in instantiate sampler ( #7516 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-08 09:02:40 -07:00
Eran Geva
5f2a42b3df
[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored ( #7219 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-08 08:45:58 -04:00
Chang Liu
4a1e13897f
[None][feat] Update multimodal utility get_num_tokens_per_image for better generalization ( #7544 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-08 07:42:46 -04:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
JunyiXu-nv
504bb7ffa9
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API ( #7508 )
...
Signed-off-by: Junyi Xu
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-08 11:11:35 +08:00
Yan Chunwei
205c3a144c
[None][chore] expose tokens_per_block into KvCacheConfig ( #5911 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-07 21:14:10 -04:00
Netanel Haber
0fee8cd028
[TRTLLM-7153] [feat] Move stop_criteria to sample_async ( #7041 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-09-07 17:36:49 +03:00
Raayan Dhar
bae9560e62
[ https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks ( #7455 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Mike Iovine
45390402fc
[ https://nvbugs/5502352 ][fix] Fix 2-model CDL path ( #7543 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-06 23:53:27 -04:00
Chang Liu
99b98f1374
[TRTLLM-7440][fix] Split fused_input_embed to separate out host sync ( #7280 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-06 23:11:39 -04:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse ( #7106 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
QI JUN
12ecb864c2
[None][chore] share input_ids buffers among different cuda graphs ( #7236 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 17:49:42 -04:00
Anthony Chang
12c66f7610
[None][fix] DeepSeek-R1 W4A8 weight loading issue; fixes regression from #6200 ( #7123 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-09-07 00:04:56 +08:00
Lucas Liebenwein
74105a45d9
[ #6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example ( #7221 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
Leslie Fang
9eb3911470
[None][chore] Remove executor_config in create_py_executor_instance ( #7463 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-05 20:56:03 +08:00
Robin Kobus
a95d9616ba
[ #6186 ][feat] Introduce QKNormRoPEAttention module ( #6830 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-09-05 14:04:41 +02:00
Jin Li
2189a2f3ff
[ https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… ( #7441 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Naveenraj Kamalakannan
58d1036bb1
[ #3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding ( #7490 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception ( #6931 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Rashid Kaleem
89889fb526
[ https://nvbugs/5369366 ] [fix] Report failing requests ( #7060 )
...
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
2025-09-04 12:56:23 -07:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos ( #7360 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec ( #7481 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle ( #7234 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Wanli Jiang
4e3dded64d
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #7521 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-04 20:16:10 +08:00
WeiHaocheng
5bcda7520b
[ https://nvbugs/5477730 ][fix] Fix the alltoall case when tp_size larger than ep_size ( #7331 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-04 08:10:03 -04:00
kris1025
cce9556858
[ https://nvbugs/5485886 ][fix] Fix resource free of Eagle3ResourceManager ( #7437 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 ( #7525 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00
jianweiwu
7090b286b2
[None][fix] fix hunyuan_moe init bug ( #7502 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
2025-09-04 03:06:00 -04:00
Grzegorz Kwasniewski
3755f8ab7d
[TRTLLM-6342][fix] Fixed triggering BMM sharding ( #7389 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-04 02:01:27 -04:00
William Zhang
a117e7a57e
[TRTLLM-7442][model] Remove unnecessary D2H copies ( #7273 )
...
* Why?
Initial profiling showed there were multiple D2H / H2D copies being
scheduled in the mistral 3.1 small model.
* What?
This commit removes those unnecessary copies by returning `image_sizes`
as a simple list instead of a tensor.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-03 23:14:20 -04:00
Jin Li
2a2dfe273b
[ https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… ( #7442 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Frida Hou
51a2b8729e
[ #7222 ][autodeploy] Separate run_shape_prop as another graph utility ( #7313 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-09-03 19:32:50 -04:00
Leslie Fang
bd9ba97d89
[None][chore] Remove two unused parameters in create_py_executor ( #7458 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-04 07:31:31 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) ( #6948 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Mike Iovine
64e3bfa054
[None][fix] Fix KV cache recompute in draft_target spec decode ( #7348 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 15:04:14 -04:00
Anurag Mukkara
ae5136831f
[ https://nvbugs/5472947 ][fix] wait on isend handles before reusing buffers ( #7462 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-09-03 13:20:02 +05:30
YueWeng
9a4f60687f
[ https://nvbugs/5480289 ][fix] release slot manager in mtp MTPHiddenStatesManager ( #7340 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-09-02 19:37:51 -07:00
Jinyang Yuan
572551b586
[None][perf] Autotune TRT-LLM Gen MoE when using CUDA graphs ( #7285 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-09-03 10:08:59 +08:00
Leslie Fang
42697ea32a
[None][chore] rm executor config in kv cache connector ( #7372 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-03 08:13:13 +08:00
JunyiXu-nv
eefe5f2093
[TRTLLM-7208][feat] Implement basic functionalities for Responses API ( #7341 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-09-02 07:08:22 -04:00
tomeras91
9c8d2161d0
[None][doc] fix example in docstring ( #7410 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-09-02 11:59:49 +03:00
Leslie Fang
e81c50dbd2
[None][chore] Use llm args in create_py_executor ( #7239 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-01 16:27:55 -07:00
Mike Iovine
b3c57a7042
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation ( #7100 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-01 14:37:44 -04:00
QI JUN
ed4087a295
[ https://nvbugs/5374016 ][fix] improve error message ( #6893 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Aurelien Chartier
93e623b455
[ https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 ( #6913 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00