Yan Chunwei
612c26be22
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs ( #7375 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula ( #7367 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
dongfengy
367ff88a5e
[None][feat] Refactor llama4 for multimodal encoder IFB ( #6844 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-28 13:22:19 -07:00
yunruis
c4f823319b
[None][doc] add adp balance blog ( #7213 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM ( #7260 )
...
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. ( #7241 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00
Zheng Duan
4f84a45899
[ https://nvbugs/5452463 ][doc] update disagg doc about UCX_MAX_RNDV_RAILS ( #7205 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-25 22:42:42 -04:00
Leslie Fang
9df15b2104
[None][doc] update feature_combination_matrix doc ( #6691 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 08:25:31 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc ( #7180 )
...
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
dongfengy
d94cc3fa3c
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site ( #7143 )
...
Signed-off-by: Dongfeng Yu
2025-08-22 16:17:01 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image ( #7101 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill ( #6661 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Kaiyu Xie
9a74ee9dae
[None] [doc] Add more documents for large scale EP ( #7029 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 19:04:39 +08:00
Fridah-nv
97ba0eb879
[None][autodeploy] Doc: fix link path in trtllm bench doc ( #7007 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-19 08:43:28 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill ( #6314 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Bo Li
8b05b5d801
[None][doc] Update gpt oss doc ( #6954 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-18 01:27:30 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 ( #6945 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context ( #6929 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
JunyiXu-nv
70e352a6f7
[ https://nvbugs/5437106 ][fix] Add L4 Scout benchmarking WAR option in deploy guide ( #6829 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 ( #6883 )
...
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide ( #6879 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Zhenhua Wang
8416d7fea8
[ https://nvbugs/5412885 ][doc] Add the workaround doc for H200 OOM ( #6853 )
...
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs ( #6592 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
Fridah-nv
cc0f4c87d4
[None][doc] Move AutoDeploy README.md to torch docs ( #6528 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-08 19:11:45 -04:00
Chang Liu
9687bb42b5
[None][doc] Add doc for multimodal feature support matrix ( #6619 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 02:20:29 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving ( #6704 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Andrew Chen
4ecda91ecc
[ https://nvbugs/5423962 ][fix] Address broken links ( #6531 )
2025-08-07 16:00:05 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task ( #6669 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) ( #6300 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
shaharmor98
c23e8e7b05
[TRTLLM-6092][doc] Add LoRA feature usage doc ( #6603 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-07 05:24:12 -04:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental ( #5995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yanchao Lu
b7347ce7d1
[ https://nvbugs/5433581 ][fix] Revert deep_gemm installation workaround for SBSA ( #6666 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 18:50:53 +08:00
Guoming Zhang
3036d49071
[None][doc] Unify the tech blogs naming. ( #6649 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 01:45:40 -04:00
Farshad Ghodsian
6af1514dc3
[None][doc] Adding GPT-OSS Deployment Guide documentation ( #6637 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-05 19:19:48 +02:00
Guoming Zhang
db51ab11a9
[TRTLLM-5990][doc] trtllm-serve doc improvement. ( #5220 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-05 13:04:01 +08:00
Yanchao Lu
d53cc2374b
[ https://nvbugs/5433581 ][infra] Update install docs and CI script for SBSA deep_gemm workaround ( #6607 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-04 23:36:38 -04:00
Enwei Zhu
899b74c357
[None][doc] Fix blog4 typo ( #6612 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-05 10:20:37 +08:00
Leslie Fang
b9fe0fa7ec
[None][infra] Enable test of chunked prefill with logit post processor ( #6483 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:46:07 -04:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill ( #6386 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
Zhenhua Wang
59d91b8b94
[None][chore] add online help to build_wheel.py and fix a doc link ( #6391 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-04 13:14:55 +08:00
Zac Patel
18d1941083
[doc] Update perf_overview.md for release 0.21 ( #6270 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
5913282e17
doc: update release notes ( #6438 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
e1eca33dfc
doc: update release notes ( #6324 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00