Commit Graph

529 Commits

Author SHA1 Message Date
Yan Chunwei
b7a255d67e [TRTLLM-9075][doc] refine the slurm examples (#9548)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
QI JUN
0915c4e3a1 [TRTLLM-9086][doc] Clean up TODOs in documentation (#9292)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Pengyun Lin
c6dc68a28e [None][doc] VDR 1.0 trtllm-serve doc enhancement (#9443)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Yan Chunwei
3e442922a3 [TRTLLM-9160][doc] add doc to llm_runtime.py (#9482)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Lucas Liebenwein
a1964bcbbc
[#9643][fix] AutoDeploy: fix nano sharding config (#9668)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-04 03:10:25 +08:00
JunyiXu-nv
beffbd6002
[TRTLLM-9242][doc] Add examples showcasing openai compatible APIs (#9520)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 11:47:02 +08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (#9572)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
Iman Tabrizian
356a52edf5
[None][feat] Add support for KVCache reuse for DSv32 (#9383)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-02 11:14:30 +08:00
Zhenhuan Chen
24004535fe
[None][chore] refactor disaggregated scripts to use named arguments (#9581)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-01 17:33:47 +08:00
Enwei Zhu
34e2fa5c96
[https://nvbugs/5690172][fix] Fix Qwen3-235B ATP accuracy issue with PDL (#9530)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 09:10:21 +08:00
heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV (#9333)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
dominicshanshan
6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00
Kaiyu Xie
0d3c0c2156
[None] [chore] Enhancements and clean up to slurm scripts (#9493)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-28 16:41:41 +08:00
Lucas Liebenwein
2f8bd6fb36
[#9150][feat] AutoDeploy Nemotron-Flash support (#9504)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-27 18:03:57 +01:00
Enwei Zhu
c2562fc800
[https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult (#9449)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-27 22:54:40 +08:00
Chenghao Zhang
18fbda5cdb
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers (#9422)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-26 14:39:20 -08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
Wanli Jiang
d100599ea7
[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (#9246)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-26 11:12:35 +08:00
Yiqing Yan
1b9edf62c9
[None][chore] Bump version to 1.2.0rc5 (#9455)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-26 08:37:53 +08:00
Tailing Yuan
51ef0379d2
[None][feat] Add a parser to layer-wise benchmarks (#9440)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-25 05:45:16 -08:00
Suyog Gupta
efd503751f
[#9271][perf] Enable multi-stream MOE optimization in AutoDeploy (#9322)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-24 19:50:10 -08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve (#9269)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
Yiqing Yan
8cd3b496e9
[None][chore] Bump version to 1.2.0rc4 (#9363)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 18:28:12 +08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
jiahanc
255e4ea9f0
[None][doc] Update DS-R1 example doc (#9231)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-18 21:10:02 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example (#8434)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Zero Zeng
43896af1b1
[None][chore] benchmark refactor (#9207)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-17 23:29:28 -08:00
Stanley Sun
96cfdd8a72
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-17 22:02:24 -08:00
Zero Zeng
c6cce398f5
[TRTLLM-9053][feat] Support accuracy test and install from wheel (#9038)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-13 23:34:47 -08:00
dongxuy04
84483a238a
[None][doc] update docs for EPLB (#9166)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-13 22:24:29 -08:00
Fanrong Li
25bd2e6917
[None][doc] Add DeepSeek-V3.2-Exp document (#9141)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-13 22:01:58 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module (#8682)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
Timothy Gao
96132b4274
[None] [doc] Add Mixed Precision Context and Generation section to Disagg (#8769)
Signed-off-by: Timothy Gao <35588167+timothygao8710@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-11 23:46:12 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
Lucas Liebenwein
6bf4e59267
[#8763][feature] AutoDeploy: configurable dtype for caching (#8812)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-10 22:17:14 -08:00
jiahanc
de6088e363
[None][doc] update llama and llama4 example doc (#9048)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 22:04:26 -08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py (#8938)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
c836ae5aaa
[None][chore] Bump version to 1.2.0rc3 (#9004)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-07 01:24:32 -08:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
shuyixiong
c73efe12e7
[None][chore] Use cached model in all ray tests (#8962)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-06 15:14:15 +01:00
Yi Sun
cc12d33393
[None][feat] Deep Research Implemented with Scaffolding (#8452)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2025-11-06 10:33:28 +08:00
JadoTu
6bbb43f2b9
[None][feat] Add qwen3-next nvfp4 support (#8526)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-06 09:45:44 +08:00
fredricz-20070104
fdd9e4fe00
[TRTLLM-7251][test] Get submit eplb slots empty key work (#8945)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-05 05:21:02 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Anish Shanbhag
6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00
Kaiyu Xie
db2a42f641
[None][chore] Add sample yaml for wide-ep example and minor fixes (#8825)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-03 07:48:34 -08:00