dongxuy04
9eb8084ca9
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist ( #7727 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-21 11:01:51 -07:00
Yuxian Qiu
7d28acdbf0
[ https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) ( #7797 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 18:50:40 +08:00
Kyungmin Lee
6fcc0540f0
[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py ( #2382 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2025-09-18 21:54:26 -07:00
sunnyqgg
80dd8fe197
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle ( #7001 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-18 12:05:36 -04:00
Yanchao Lu
f8e811d134
[None][chore] Version bump for 1.1.0rc6 ( #7824 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-18 11:13:56 +08:00
Lucas Liebenwein
39eb120b96
[ #7308 ] [feat] AutoDeploy: graph-less transformers mode for HF ( #7635 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-09-18 10:44:24 +08:00
QI JUN
d3467f9f12
[None][doc] fix section header of llm_kv_cache_offloading example ( #7795 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 17:26:11 +08:00
QI JUN
39248320d4
[None][feat] add an example of KV cache host offloading ( #7767 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 13:51:15 +08:00
Zhenhuan Chen
6983e8a00d
[ https://nvbugs/5517260 ][fix] move scaffolding contrib module's import to subdirectory ( #7758 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-09-17 11:36:33 +08:00
amitz-nv
750d15bfaa
[ https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF ( #7740 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
Kaiyu Xie
6eef19297f
[None] [chore] cherry pick changes on slurm scripts from release/1.1.0rc2 ( #7750 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-16 16:07:13 +08:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache ( #7612 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yiqing Yan
76c5e1a12f
[None][infra] Bump version to 1.1.0rc5 ( #7668 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-10 16:06:54 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next ( #7349 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
Richard Huo
dcd110cfac
[None][chore] add TorchLlmArgs to the connector api ( #7493 )
...
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
2025-09-09 09:05:59 -04:00
Guoming Zhang
62b564ac3c
[None][fix] add the missing import raised by #7607 ( #7639 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-09 03:42:42 -04:00
Guoming Zhang
35dac55716
[None][doc] Update kvcache part ( #7549 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Guoming Zhang
f53fb4c803
[TRTLLM-5930][doc] 1.0 Documentation. ( #6696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Wanli Jiang
1e0669d27a
[ https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL ( #7152 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-09 10:38:20 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Lucas Liebenwein
74105a45d9
[ #6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example ( #7221 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
Naveenraj Kamalakannan
58d1036bb1
[ #3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding ( #7490 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
2025-09-04 19:46:49 -07:00
Yiqing Yan
ced5512ae4
[None][chore] Bump version to 1.1.0rc4 ( #7525 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-04 16:30:47 +08:00
Leslie Fang
42697ea32a
[None][chore] rm executor config in kv cache connector ( #7372 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-03 08:13:13 +08:00
Kanghwan
f58a183c6e
[None][chore] Fix formatting error in Gemma3 readme ( #7352 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-09-03 01:15:37 +08:00
jiahanc
9f2dc3069d
[None] [doc] Update DeepSeek example doc ( #7358 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-09-01 14:43:58 -04:00
Yan Chunwei
612c26be22
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
brb-nv
0253036a4e
[None][chore] Add docs for Gemma3 VLMs ( #6880 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc
[TRTLLM-6975][test] Add multi-turn test cases for VLM models ( #6749 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yiqing Yan
ec595a8e29
[None][chore] Bump version to 1.1.0rc2 ( #7394 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-31 10:20:38 +08:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API ( #7228 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
Kaiyu Xie
23f72c8bbd
[None] [feat] Use numa to bind CPU ( #7304 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-28 06:27:11 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss ( #7261 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Kaiyu Xie
8a619be828
[None] [chore] Make disagg example compatible with recommended usage ( #7121 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-27 23:57:46 +08:00
Raayan Dhar
82bd1871ea
[None][chore] update disagg readme and scripts for pipeline parallelism ( #6875 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-27 00:53:57 -04:00
Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder ( #7233 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Yiqing Yan
907bc22fcb
[None][chore] Bump version to 1.1.0rc2 ( #7167 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-22 22:02:28 +08:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Zhenhuan Chen
20f54cb272
[None][fix] fix scaffolding dynasor test ( #7070 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-20 15:20:46 +08:00
zhhuang-nv
7e135d2ea7
[None][feat] Use Separate QKV Input Layout for Context MLA ( #6538 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-08-19 22:04:48 +08:00
Xianjie Qiao
19667304b5
[None] [chore] Update wide-ep genonly scripts ( #6995 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 07:44:07 -04:00
Kaiyu Xie
9a74ee9dae
[None] [doc] Add more documents for large scale EP ( #7029 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 19:04:39 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Yiqing Yan
ec3d9f8052
[None][chore] Bump version to 1.1.0rc1 ( #6953 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-16 10:32:47 +08:00
Xianjie Qiao
c2fe8b03a2
[ https://nvbugs/5405041 ][fix] Update wide-ep doc ( #6933 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-08-15 05:32:32 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context ( #6929 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix ( #6908 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support ( #5521 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types ( #6816 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #6323 )
2025-08-14 03:48:57 -04:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Aurelien Chartier
2e0081b53e
[ #6530 ][fix] Fix script when using calibration tensors from modelopt ( #6803 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-12 20:41:10 -07:00
Kaiyu Xie
47806f09d9
feat: Support custom repo_dir for SLURM script ( #6546 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models ( #6497 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy ( #6641 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
Liao Lanyu
a2e9153cb0
[None][doc] Add K2 tool calling examples ( #6667 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-11 16:25:41 +08:00
Yibin Li
97787883c3
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix ( #6493 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-08 21:40:48 -04:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task ( #6669 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Yiqing Yan
5fa1914cab
[None][chore] Bump version to 1.1.0rc0 ( #6651 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-07 13:39:49 +08:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental ( #5995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability ( #6506 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
chenfeiz0326
a16ba6445c
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 ( #6550 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-06 22:15:24 +08:00
Yuxian Qiu
3a71ddfe09
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. ( #6579 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-06 22:13:54 +08:00
Pengyun Lin
79fc2f48c0
[None][chore] Enhance trtllm-serve example test ( #6604 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-06 20:30:35 +08:00
jiahanc
3170039e36
[None][doc] Add llama4 hybrid guide ( #6640 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-08-06 01:25:38 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Olya Kozlova
13cc1c4878
[TRTLLM-5271][feat] best_of/n for pytorch workflow ( #5997 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-08-04 14:08:06 +02:00
Yiqing Yan
3916dbd98b
[None][chore] Bump version to 1.0.0rc6 ( #6597 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-04 04:39:15 -04:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens ( #6259 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
Kaiyu Xie
aee35e2dbd
chore: Make example SLURM scripts more parameterized ( #6511 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-01 12:53:15 +08:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell ( #6486 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control ( #6220 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI ( #6477 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. ( #6418 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts ( #6345 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… ( #6428 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Kaiyu Xie
e58afa510e
doc: Add README for wide EP ( #6356 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-29 00:36:12 -04:00
Liana Koleva
96d004d800
doc: fix invalid link in llama 4 example documentation ( #6340 )
...
Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
2025-07-26 11:27:10 -04:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache ( #5974 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Simeng Liu
7bff341553
[doc] Add NGram tech blog ( #6311 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-25 10:26:33 -07:00
nv-guomingz
31d3eff24b
doc: fix invalid links related with llm api example ( #6317 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-24 00:46:51 -04:00
Yechan Kim
83c3ed128b
chore: set default device to cpu on Multimodal models ( #5994 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 21:45:31 -07:00
Kaiyu Xie
f08286c679
doc: Refactor documents and examples of disaggregated serving and wide ep ( #6054 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-23 09:20:57 +08:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 ( #6196 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
Yiqing Yan
3e18ee5fe1
chore: bump version to 1.0.0rc5 ( #6252 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-22 16:24:28 +08:00
Mike Iovine
9645814bdf
[chore] Clean up quickstart_advanced.py ( #6021 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-21 15:00:59 -04:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… ( #6150 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
bhsueh_NV
2e14c8f443
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 ( #6065 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-20 10:25:25 +08:00
wili
82d3587bb8
[refactor] Unify name of NGram speculative decoding ( #5937 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-19 12:59:57 +08:00
Zhenhuan Chen
992b273045
[ https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e ( #6140 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-18 10:34:37 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler ( #6000 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Yiqing Yan
e51c541617
chore: Bump version to 1.0.0rc4 ( #6086 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-16 13:02:23 +08:00
Xiaodong (Vincent) Huang
0523f77b36
support TRTLLM_DEEP_EP_TOKEN_LIMIT to allow run deep-ep on memory-con… ( #5684 )
...
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
2025-07-15 18:34:21 +03:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
jiahanc
24dfd4cd0b
Doc: Update llama-3.3-70B guide ( #6028 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-15 11:37:26 +09:00
Yechan Kim
2320f12321
doc: update EXAONE 4.0 news ( #6034 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-15 10:26:51 +09:00
Yechan Kim
63139fdcff
feat: EXAONE4.0 support ( #5696 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-14 22:28:10 +09:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check ( #5709 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Yan Chunwei
9c673e9707
[TRTLLM-6160] chore: add sampling examples for pytorch ( #5951 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 15:28:32 +09:00
Yan Chunwei
c30eead09f
[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch ( #5956 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 14:09:39 +08:00
Xianjie Qiao
c7ffadf692
Fix errors in wide-ep scripts ( #5992 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-07-14 14:07:27 +09:00
Shi Xiaowei
49359574c1
[TRTLLM-5673] Doc: ensure the disagg doc is up to date ( #5938 )
2025-07-11 17:39:05 +09:00
William Tambellini
fbb4cc7379
[TRTLLM-4770][feat] Enhance cpp executor cmake to listen to ENABLE_MU… ( #5104 )
...
...LTI_DEVICE
Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-07-11 10:59:44 +08:00
Iman Tabrizian
c32c9e2fad
doc: Add instructions for running gemma in disaggregated serving ( #5922 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-10 10:21:19 -07:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
c24eb67054
Doc: fix link in llama4 Maverick example ( #5864 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 11:09:58 +09:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide ( #5810 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example ( #5706 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Yiqing Yan
5203a0f6df
chore: bump version to 1.0.0rc3 ( #5819 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 16:04:40 +09:00
Zhenhuan Chen
dee6644ed9
feat(scaffolding): add streaming scaffolding_llm.generate_async support ( #5345 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-08 15:08:40 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow ( #5615 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
bhsueh_NV
85e934a7fe
[Doc] update the document of qwen3 and cuda_graph usage ( #5703 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-07 09:44:25 +08:00
Xianjie Qiao
b1976c2add
Add wide-ep benchmarking scripts ( #5760 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-05 19:29:39 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config ( #5680 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Linda
94f0252b46
Doc: Update invalid hugging face URLs ( #5683 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Lucas Liebenwein
24ac9b5f69
[AutoDeploy] merge feat/ad-2025-06-29 ( #5737 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-07-04 10:21:18 +09:00
Yiqing Yan
3c9dd5cd66
chore: bump version to 1.0.0rc2 ( #5645 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-03 12:35:28 +08:00
Shunkangz
3e75320fe8
Add pd dynamic scaling readme ( #5540 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.com>
2025-07-02 02:18:51 -04:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) ( #5431 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
dongjiyingdjy
852b79053d
feat : support duplicate_kv_weight for qwen3 blockwise scale ( #5459 )
...
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2025-06-30 11:49:22 +08:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve ( #5376 )
...
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
Lucas Liebenwein
619709fc33
[AutoDeploy] merge feat/ad-2025-06-13 ( #5556 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-29 03:52:14 +08:00
Darragh Hanley
5437075def
ReDrafter support for Qwen ( #4875 )
...
Signed-off-by: darraghdog <darragh.hanley@gmail.com>
Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
2025-06-28 02:33:10 +08:00
wili
56cdfe5c6c
[TRTLLM-5000][feat] NGrams V2 ( #4569 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-06-27 23:00:17 +08:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) ( #5475 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
QI JUN
3a2c4ca77b
chore: split _build_model method for TorchLlm and TrtLlm ( #5418 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 04:32:46 +08:00
Yiqing Yan
f3cfe86dd1
chore: bump version to 1.0.0rc1 ( #5460 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-25 16:21:34 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding ( #5348 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Enwei Zhu
76da7fed86
fix (NvBug 5354925): Fix static EPLB ( #5411 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 13:14:40 +08:00
Fanrong Li
ebadc13086
[doc] update mtp documents ( #5387 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-21 16:05:52 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00