Commit Graph

262 Commits

Author SHA1 Message Date
Yan Chunwei
612c26be22 [None][doc] add legacy section for tensorrt engine (#6724)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs (#7375)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula (#7367)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
dongfengy
367ff88a5e
[None][feat] Refactor llama4 for multimodal encoder IFB (#6844)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-28 13:22:19 -07:00
yunruis
c4f823319b
[None][doc] add adp balance blog (#7213)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260)
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. (#7241)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00
Zheng Duan
4f84a45899
[https://nvbugs/5452463][doc] update disagg doc about UCX_MAX_RNDV_RAILS (#7205)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-25 22:42:42 -04:00
Leslie Fang
9df15b2104
[None][doc] update feature_combination_matrix doc (#6691)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 08:25:31 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc (#7180)
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
Suyog Gupta
e3de5758a3
[#7136][feat] trtllm-serve + autodeploy integration (#7141)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
dongfengy
d94cc3fa3c
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site (#7143)
Signed-off-by: Dongfeng Yu
2025-08-22 16:17:01 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image (#7101)
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Kaiyu Xie
9a74ee9dae
[None] [doc] Add more documents for large scale EP (#7029)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 19:04:39 +08:00
Fridah-nv
97ba0eb879
[None][autodeploy] Doc: fix link path in trtllm bench doc (#7007)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-19 08:43:28 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill (#6314)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Bo Li
8b05b5d801
[None][doc] Update gpt oss doc (#6954)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-18 01:27:30 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context (#6929)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
JunyiXu-nv
70e352a6f7
[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide (#6829)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 (#6883)
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide (#6879)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Zhenhua Wang
8416d7fea8
[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM (#6853)
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models (#6720)
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs (#6592)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
Fridah-nv
cc0f4c87d4
[None][doc] Move AutoDeploy README.md to torch docs (#6528)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-08 19:11:45 -04:00
Chang Liu
9687bb42b5
[None][doc] Add doc for multimodal feature support matrix (#6619)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 02:20:29 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Andrew Chen
4ecda91ecc
[https://nvbugs/5423962][fix] Address broken links (#6531) 2025-08-07 16:00:05 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task (#6669)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
shaharmor98
c23e8e7b05
[TRTLLM-6092][doc] Add LoRA feature usage doc (#6603)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-07 05:24:12 -04:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental (#5995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yanchao Lu
b7347ce7d1
[https://nvbugs/5433581][fix] Revert deep_gemm installation workaround for SBSA (#6666)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 18:50:53 +08:00
Guoming Zhang
3036d49071
[None][doc] Unify the tech blogs naming. (#6649)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 01:45:40 -04:00
Farshad Ghodsian
6af1514dc3
[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637)
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-05 19:19:48 +02:00
Guoming Zhang
db51ab11a9
[TRTLLM-5990][doc] trtllm-serve doc improvement. (#5220)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-05 13:04:01 +08:00
Yanchao Lu
d53cc2374b
[https://nvbugs/5433581][infra] Update install docs and CI script for SBSA deep_gemm workaround (#6607)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-04 23:36:38 -04:00
Enwei Zhu
899b74c357
[None][doc] Fix blog4 typo (#6612)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-05 10:20:37 +08:00
Leslie Fang
b9fe0fa7ec
[None][infra] Enable test of chunked prefill with logit post processor (#6483)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:46:07 -04:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
Zhenhua Wang
59d91b8b94
[None][chore] add online help to build_wheel.py and fix a doc link (#6391)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-04 13:14:55 +08:00
Zac Patel
18d1941083 [doc] Update perf_overview.md for release 0.21 (#6270)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
5913282e17 doc: update release notes (#6438)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
e1eca33dfc doc: update release notes (#6324)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00