Commit Graph

342 Commits

Author SHA1 Message Date
Grzegorz Kwasniewski
cff54fcae3
[#8948][feat] Support custom sharding config (#9143)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-29 05:28:05 +08:00
Lucas Liebenwein
2f8bd6fb36
[#9150][feat] AutoDeploy Nemotron-Flash support (#9504)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-27 18:03:57 +01:00
Liao Lanyu
5425d96757
[TRTLLM-9513][docs] Qwen3 deployment guide (#9488)
Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
2025-11-27 14:12:35 +08:00
Jiagan Cheng
14762e0287
[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (#9294)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-11-27 12:22:01 +08:00
QI JUN
c6fa042332
[TRTLLM-9085][doc] fix math formula rendering issues (#9481)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 10:09:12 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
mpikulski
1944fb15af
[None][fix] add missing CLI option in multimodal example (#8977)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:06:08 +01:00
Zhanrui Sun
4de31bece2
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 18:59:34 +08:00
Anish Shanbhag
6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00
Guoming Zhang
65b793c77e
[None][doc] Add the missing content for model support section and fix valid links for long_sequence.md (#8869)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-03 02:06:04 -08:00
Yan Chunwei
271a981f1f
[None][doc] Add LLM-API API change principle (#8350)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-03 01:47:15 -08:00
dongfengy
6424f7e55f
[None][doc] Clarify the perf best practice and supported hardware for gptoss (#8665)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-31 10:11:59 -07:00
Yi Zhang
496b419791
[None][doc] Add doc for torch.compile & piecewise cuda graph (#8527)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-10-29 21:15:46 -07:00
Sharan Chetlur
a2e964d9a8
[None][doc] Minor doc update to disagg-serving (#8768)
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-10-29 17:38:06 -07:00
Robin Kobus
990b0c0c47
[TRTLLM-7159][docs] Add documentation for additional outputs (#8325)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-27 09:52:04 +01:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
yunruis
8c9fda4b85
[None][doc] Paragraph adjustment and fix statistic (#8568)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-10-22 03:26:09 -04:00
Shi Xiaowei
50149ac2bd
[None][doc] Fix the incorrect doc figure (#8536)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 10:08:55 +08:00
Shi Xiaowei
a0024f4d34
[None][doc] Facilitates the integration of the transfer agent (#7867)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-21 20:06:24 +08:00
Yueh-Ting (eop) Chen
85088dce05
[None][chore] Update feature combination matrix for SWA kv cache reuse (#8529)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-21 04:41:44 -04:00
Yechan Kim
85d5aa7763
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (#7789)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-21 11:11:24 +09:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs (#8039)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
Leslie Fang
023e515d33
[None][chore] Combine two documents of feature combination matrix (#8442)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-17 14:31:33 +08:00
xiweny
89d03d7668
[https://nvbugs/5532789] [doc] Add documents about CUDA 12.9 (#8411)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-16 00:05:17 -07:00
Erin
f4e7738f65
[None][doc] Ray orchestrator initial doc (#8373)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-14 21:17:57 -07:00
Kaiyu Xie
c822c117ce
[None] [docs] Update TPOT/ITL docs (#8378)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-10-14 20:50:54 -07:00
Kaiyu Xie
040103ab56
[None] [blog] Scaling Expert Parallelism in TensorRT LLM (Part 3: Pushing the Performance Boundary) (#8323)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-10-13 06:37:17 -07:00
Guoming Zhang
989c25fcba
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. (#8288)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 10:25:45 +08:00
Guoming Zhang
656d73087e
[None][doc] Fix several invalid ref links in deployment guide sections. (#8287)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-10-13 10:22:32 +08:00
Guoming Zhang
a193867f8f
[None][doc] Refine deployment guide by renaming TRT-LLM to TensorRT L… (#8214)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-10-09 17:11:24 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
WeiHaocheng
563e588e56
[None][doc] Scaffolding tech blog fix a typo (#8042)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-28 10:29:01 -04:00
Guoming Zhang
51aefd1bac
[None][doc] Refine perf overview.md and correct the error link in per… (#8035)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-28 16:14:42 +08:00
Chuang Zhu
f98fa0cf8b
[None][feat] Optimize kv cache transfer TEP (#7613)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 20:20:04 -07:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
WeiHaocheng
4b0570a0d6
[None][doc] Add acknowledgements in scaffolding tech blog (#7983)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-25 08:07:13 -07:00
Yan Chunwei
40c6103ef8 [None][doc] add Llama PP known issue to release note (#7959)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
663ce3a4de [None][doc] fix invalid links in perf benchmarking. (#7933)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Zac Patel
c38d4cf6a6 [None][doc] Update Perf-Overview.md for release/1.0 (#7848)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
57c098956e [None][doc] add a guide for modifying APIs (#7866)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e [None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
5ecc8d0ee2 [None][doc] Replace the main in the examples' link with commit id. (#7837)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
4a09be40f0 [None][doc] Update docker cmd in quick start guide and trtllm-serve … (#7787)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
WeiHaocheng
259cc66c34
[None][doc] scaffolding tech blog part one (#7835)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: zheyuf <zheyuf@NVIDIA.com>
Co-authored-by: zheyuf <zheyuf@NVIDIA.com>
2025-09-25 14:41:59 +08:00
Aurelien Chartier
98726a3bed
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache (#7885)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-25 09:28:53 +08:00
Leslie Fang
342014069e
[None][chore] Validate features combination (#7630)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-25 08:01:13 +08:00
xxi
d471655242
[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712) 2025-09-23 08:41:38 +08:00
Guoming Zhang
edbe270198 [TRTLLM-7958][doc] add 1.0 release notes (#7605)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00