Commit Graph

1194 Commits

Author SHA1 Message Date
Xiwen Yu
0b73a57c33 refine sm version check
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 14:51:06 +08:00
Xiwen Yu
2e61526d12 fix
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 10:34:18 +08:00
Xiwen Yu
5f508b7d43 Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 07:46:25 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
Jin Li
d49374bc45
[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-09 12:18:56 -04:00
Richard Huo
dcd110cfac
[None][chore] add TorchLlmArgs to the connector api (#7493)
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
2025-09-09 09:05:59 -04:00
NVJiangShao
cc7593987b
[https://nvbugs/5434424][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-09-09 08:58:15 -04:00
tomeras91
6e712dd1cc
[None][fix] enable NvFP4/FP8 quantization for Nemotron-H architecture (#7589)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-09-09 11:42:22 +03:00
Linda
9cb5410067
[https://nvbugs/5454559][fix] handle bias term in fuse_gate_mlp (#7449)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-09 10:26:17 +02:00
William Zhang
c53d1814a7
[None][feat] Extend VLM factory and add Mistral3 factory (#7583)
This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-09 02:47:18 -04:00
Xiwen Yu
a8b630f178 Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 14:34:27 +08:00
Xiwen Yu
8cc5ea331a add comment
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 14:32:47 +08:00
Guoming Zhang
f53fb4c803 [TRTLLM-5930][doc] 1.0 Documentation. (#6696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
zhanghaotong
96af324ff1
[None][fix] Add try-catch in stream generator (#7467)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
2025-09-08 16:09:26 -04:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Leslie Fang
3e0073e86b
[None][chore] remove executor config in instantiate sampler (#7516)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-08 09:02:40 -07:00
Eran Geva
5f2a42b3df
[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored (#7219)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-08 08:45:58 -04:00
Chang Liu
4a1e13897f
[None][feat] Update multimodal utility get_num_tokens_per_image for better generalization (#7544)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-08 07:42:46 -04:00
Xiwen Yu
fdaf4e2985 Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-08 15:14:54 +08:00
Xiwen Yu
019b1db438 fix 5505835
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-08 14:52:00 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd (#7519)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
JunyiXu-nv
504bb7ffa9
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API (#7508)
Signed-off-by: Junyi Xu 
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-08 11:11:35 +08:00
Yan Chunwei
205c3a144c
[None][chore] expose tokens_per_block into KvCacheConfig (#5911)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-07 21:14:10 -04:00
Netanel Haber
0fee8cd028
[TRTLLM-7153] [feat] Move stop_criteria to sample_async (#7041)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-09-07 17:36:49 +03:00
Raayan Dhar
bae9560e62
[https://nvbugs/5448767][fix] sync termination of requests across PP ranks (#7455)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Mike Iovine
45390402fc
[https://nvbugs/5502352][fix] Fix 2-model CDL path (#7543)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-06 23:53:27 -04:00
Chang Liu
99b98f1374
[TRTLLM-7440][fix] Split fused_input_embed to separate out host sync (#7280)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-06 23:11:39 -04:00
Xiwen Yu
291290851a Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:28:24 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
QI JUN
12ecb864c2
[None][chore] share input_ids buffers among different cuda graphs (#7236)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 17:49:42 -04:00
Anthony Chang
12c66f7610
[None][fix] DeepSeek-R1 W4A8 weight loading issue; fixes regression from #6200 (#7123)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-09-07 00:04:56 +08:00
Xiwen Yu
322db710dc Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 23:58:04 +08:00
Lucas Liebenwein
74105a45d9
[#6120][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
Xiwen Yu
5e7aa76bb4 Merge branch 'user/sm103_trtllmgen' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 00:49:23 +08:00
Leslie Fang
9eb3911470
[None][chore] Remove executor_config in create_py_executor_instance (#7463)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-05 20:56:03 +08:00
Robin Kobus
a95d9616ba
[#6186][feat] Introduce QKNormRoPEAttention module (#6830)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-09-05 14:04:41 +02:00
Xiwen Yu
2c3f4cbeee Merge remote-tracking branch 'origin/main' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
Xiwen Yu
22219bc37e Add B300 & GB300 CI
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:29:50 +08:00
Jin Li
2189a2f3ff
[https://nvbugs/5483615][fix] Remove unnecessary assertion to let mai… (#7441)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Naveenraj Kamalakannan
58d1036bb1
[#3325][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding (#7490)
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception (#6931)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Rashid Kaleem
89889fb526
[https://nvbugs/5369366] [fix] Report failing requests (#7060)
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
2025-09-04 12:56:23 -07:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos (#7360)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle (#7234)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Wanli Jiang
4e3dded64d
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#7521)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-04 20:16:10 +08:00
WeiHaocheng
5bcda7520b
[https://nvbugs/5477730][fix] Fix the alltoall case when tp_size larger than ep_size (#7331)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-04 08:10:03 -04:00
kris1025
cce9556858
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceManager (#7437)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 (#7525)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00