Commit Graph

234 Commits

Author SHA1 Message Date
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs (#6592)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
Fridah-nv
cc0f4c87d4
[None][doc] Move AutoDeploy README.md to torch docs (#6528)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-08 19:11:45 -04:00
Chang Liu
9687bb42b5
[None][doc] Add doc for multimodal feature support matrix (#6619)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 02:20:29 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Andrew Chen
4ecda91ecc
[https://nvbugs/5423962][fix] Address broken links (#6531) 2025-08-07 16:00:05 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task (#6669)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
shaharmor98
c23e8e7b05
[TRTLLM-6092][doc] Add LoRA feature usage doc (#6603)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-07 05:24:12 -04:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental (#5995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yanchao Lu
b7347ce7d1
[https://nvbugs/5433581][fix] Revert deep_gemm installation workaround for SBSA (#6666)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 18:50:53 +08:00
Guoming Zhang
3036d49071
[None][doc] Unify the tech blogs naming. (#6649)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 01:45:40 -04:00
Farshad Ghodsian
6af1514dc3
[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637)
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-05 19:19:48 +02:00
Guoming Zhang
db51ab11a9
[TRTLLM-5990][doc] trtllm-serve doc improvement. (#5220)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-05 13:04:01 +08:00
Yanchao Lu
d53cc2374b
[https://nvbugs/5433581][infra] Update install docs and CI script for SBSA deep_gemm workaround (#6607)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-04 23:36:38 -04:00
Enwei Zhu
899b74c357
[None][doc] Fix blog4 typo (#6612)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-05 10:20:37 +08:00
Leslie Fang
b9fe0fa7ec
[None][infra] Enable test of chunked prefill with logit post processor (#6483)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:46:07 -04:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
Zhenhua Wang
59d91b8b94
[None][chore] add online help to build_wheel.py and fix a doc link (#6391)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-04 13:14:55 +08:00
Zac Patel
18d1941083 [doc] Update perf_overview.md for release 0.21 (#6270)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
5913282e17 doc: update release notes (#6438)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
e1eca33dfc doc: update release notes (#6324)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
QI JUN
3f47117870 doc: update known issues (#6247)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Kaiyu Xie
147ad69368
[None][doc] blog: Scaling Expert Parallelism in TensorRT-LLM (Part 2: Performance Status and Optimization) (#6547)
Signed-off-by: Kaiyu XIe <26294424+kaiyux@users.noreply.github.com>
2025-08-01 16:46:15 +08:00
Wanli Jiang
fcd5706615
doc: add bielik model to support-matrix (#6480)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-31 00:48:53 -04:00
Yechan Kim
83621e4b80
doc: update multimodal models on support-matrix.md (#6431)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-31 08:50:18 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
Leslie Fang
d980928c96
[doc] update the doc of feature combination matrix (#6441)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-07-30 18:48:49 +08:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Kaiyu Xie
e58afa510e
doc: Add README for wide EP (#6356)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-29 00:36:12 -04:00
nv-guomingz
49044733e1
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-28 11:38:30 -04:00
Yan Chunwei
45d441e60c
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 15:57:07 +08:00
Simeng Liu
7bff341553
[doc] Add NGram tech blog (#6311)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-25 10:26:33 -07:00
Lizhi Zhou
a63a1ac7f9
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs (#6085)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-07-24 16:21:01 +08:00
nv-guomingz
31d3eff24b
doc: fix invalid links related with llm api example (#6317)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-24 00:46:51 -04:00
Kaiyu Xie
f08286c679
doc: Refactor documents and examples of disaggregated serving and wide ep (#6054)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-23 09:20:57 +08:00
Raayan Dhar
5234502717
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-22 11:28:23 -07:00
Yechan Kim
b85ab139f9
doc: add supported data modality and types on multimodal serve (#5988)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 14:32:41 +08:00
bhsueh_NV
24ce6b9517 [Doc][Qwen3] update qwen3 into support-matrix (#6161)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
QI JUN
a03c680581 add release notes for 0.21 release (#6049)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-22 12:48:00 +08:00
nv-guomingz
34dd071bd6 [TRTLLM-6495] doc: add disclaimer for 3rd party software installation. (#6039)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
amirkl94
f4f2176cd5 chore: Port leftover 0.20 (#5907)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
wili
82d3587bb8
[refactor] Unify name of NGram speculative decoding (#5937)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-19 12:59:57 +08:00
Venky
22d4a8c48a
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-19 00:50:40 +08:00
Leslie Fang
44040edbf0
update broken link of PyTorchModelEngine in arch_overview (#6171)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-07-18 19:53:38 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Frank
28385f6571
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI. (#5734)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-17 09:15:06 +08:00