TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Kyungmin Lee	6fcc0540f0	[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py (#2382 ) Signed-off-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2025-09-18 21:54:26 -07:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
Yanchao Lu	f8e811d134	[None][chore] Version bump for 1.1.0rc6 (#7824 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-18 11:13:56 +08:00
Lucas Liebenwein	39eb120b96	[#7308 ] [feat] AutoDeploy: graph-less transformers mode for HF (#7635 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-09-18 10:44:24 +08:00
QI JUN	d3467f9f12	[None][doc] fix section header of llm_kv_cache_offloading example (#7795 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 17:26:11 +08:00
QI JUN	39248320d4	[None][feat] add an example of KV cache host offloading (#7767 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 13:51:15 +08:00
Zhenhuan Chen	6983e8a00d	[https://nvbugs/5517260 ][fix] move scaffolding contrib module's import to subdirectory (#7758 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-09-17 11:36:33 +08:00
amitz-nv	750d15bfaa	[https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-09-16 16:35:13 +08:00
Kaiyu Xie	6eef19297f	[None] [chore] cherry pick changes on slurm scripts from `release/1.1.0rc2` (#7750 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-09-16 16:07:13 +08:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
Yiqing Yan	76c5e1a12f	[None][infra] Bump version to 1.1.0rc5 (#7668 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-10 16:06:54 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
Richard Huo	dcd110cfac	[None][chore] add TorchLlmArgs to the connector api (#7493 ) Signed-off-by: richardhuo-nv <rihuo@nvidia.com>	2025-09-09 09:05:59 -04:00
Guoming Zhang	62b564ac3c	[None][fix] add the missing import raised by #7607 (#7639 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-09 03:42:42 -04:00
Guoming Zhang	35dac55716	[None][doc] Update kvcache part (#7549 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Guoming Zhang	f53fb4c803	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00
Lucas Liebenwein	74105a45d9	[#6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-05 22:10:48 -04:00
Naveenraj Kamalakannan	58d1036bb1	[#3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding (#7490 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>	2025-09-04 19:46:49 -07:00
Yiqing Yan	ced5512ae4	[None][chore] Bump version to 1.1.0rc4 (#7525 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-04 16:30:47 +08:00
Leslie Fang	42697ea32a	[None][chore] rm executor config in kv cache connector (#7372 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-03 08:13:13 +08:00
Kanghwan	f58a183c6e	[None][chore] Fix formatting error in Gemma3 readme (#7352 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-03 01:15:37 +08:00
jiahanc	9f2dc3069d	[None] [doc] Update DeepSeek example doc (#7358 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-01 14:43:58 -04:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
brb-nv	0253036a4e	[None][chore] Add docs for Gemma3 VLMs (#6880 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yiqing Yan	ec595a8e29	[None][chore] Bump version to 1.1.0rc2 (#7394 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-31 10:20:38 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
Kaiyu Xie	23f72c8bbd	[None] [feat] Use numa to bind CPU (#7304 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-28 06:27:11 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
Kaiyu Xie	8a619be828	[None] [chore] Make disagg example compatible with recommended usage (#7121 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-27 23:57:46 +08:00
Raayan Dhar	82bd1871ea	[None][chore] update disagg readme and scripts for pipeline parallelism (#6875 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-08-27 00:53:57 -04:00
Fridah-nv	0f947c64cb	[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-26 10:47:57 -07:00
Yiqing Yan	907bc22fcb	[None][chore] Bump version to 1.1.0rc2 (#7167 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-22 22:02:28 +08:00
dominicshanshan	6f245ec78b	[None][chore] Mass integration of release/1.0 (#6864 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-22 09:25:15 +08:00
Zhenhuan Chen	20f54cb272	[None][fix] fix scaffolding dynasor test (#7070 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-20 15:20:46 +08:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
Xianjie Qiao	19667304b5	[None] [chore] Update wide-ep genonly scripts (#6995 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-19 07:44:07 -04:00
Kaiyu Xie	9a74ee9dae	[None] [doc] Add more documents for large scale EP (#7029 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-19 19:04:39 +08:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
Yiqing Yan	ec3d9f8052	[None][chore] Bump version to 1.1.0rc1 (#6953 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-16 10:32:47 +08:00
Xianjie Qiao	c2fe8b03a2	[https://nvbugs/5405041 ][fix] Update wide-ep doc (#6933 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-08-15 05:32:32 -04:00
jmydurant	8e252256f5	[None][doc] Modify the description for mla chunked context (#6929 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-15 12:52:26 +08:00
hlu1	5346eb7bc5	[None][doc] Update gpt-oss doc on MoE support matrix (#6908 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-15 08:50:31 +08:00
qianbiao	5c2f0fd03d	[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521 ) Signed-off-by: sorenwu <sorenwu@tencent.com> Co-authored-by: sorenwu <sorenwu@tencent.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-08-15 06:56:44 +08:00
Matthias Jouanneaux	69574ad730	[TRTLLM-5966][feat] Helix: extend mapping to support different CP types (#6816 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-08-14 09:00:02 -07:00
Shi Xiaowei	1095dfd03c	[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323 )	2025-08-14 03:48:57 -04:00

1 2 3 4 5 ...

434 Commits