Zhenhua Wang
|
8416d7fea8
|
[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM (#6853)
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
|
2025-08-13 19:51:38 +08:00 |
|
Shi Xiaowei
|
fe7dda834d
|
[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-08-13 17:39:27 +08:00 |
|
Yechan Kim
|
12102e2d48
|
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-12 19:34:02 -07:00 |
|
rakib-hasan
|
7ab8112450
|
[None][fix] Refactoring to avoid circular import when importing torch models (#6720)
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
|
2025-08-11 18:00:42 -04:00 |
|
shaharmor98
|
b6baa9ed9b
|
[TRTLLM-6823][doc] Add checkpoint refactor docs (#6592)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-08-10 19:47:39 -04:00 |
|
Fridah-nv
|
cc0f4c87d4
|
[None][doc] Move AutoDeploy README.md to torch docs (#6528)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-08-08 19:11:45 -04:00 |
|
Chang Liu
|
9687bb42b5
|
[None][doc] Add doc for multimodal feature support matrix (#6619)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-08-08 02:20:29 -04:00 |
|
Enwei Zhu
|
aee828d98a
|
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-08 12:10:36 +08:00 |
|
Daniel Cámpora
|
efca359b66
|
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-07 22:19:37 -04:00 |
|
Andrew Chen
|
4ecda91ecc
|
[https://nvbugs/5423962][fix] Address broken links (#6531)
|
2025-08-07 16:00:05 -04:00 |
|
Guoming Zhang
|
0223de0727
|
[None][doc] Add deployment guide section for VDR task (#6669)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-07 10:30:47 -04:00 |
|
Enwei Zhu
|
1b9781e8e7
|
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-07 05:53:48 -04:00 |
|
shaharmor98
|
c23e8e7b05
|
[TRTLLM-6092][doc] Add LoRA feature usage doc (#6603)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-08-07 05:24:12 -04:00 |
|
Guoming Zhang
|
f7f46a5017
|
doc: remove the outdated features which marked as Experimental (#5995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-06 22:01:42 -04:00 |
|
Yanchao Lu
|
b7347ce7d1
|
[https://nvbugs/5433581][fix] Revert deep_gemm installation workaround for SBSA (#6666)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 18:50:53 +08:00 |
|
Guoming Zhang
|
3036d49071
|
[None][doc] Unify the tech blogs naming. (#6649)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-06 01:45:40 -04:00 |
|
Farshad Ghodsian
|
6af1514dc3
|
[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637)
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
|
2025-08-05 19:19:48 +02:00 |
|
Guoming Zhang
|
db51ab11a9
|
[TRTLLM-5990][doc] trtllm-serve doc improvement. (#5220)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-05 13:04:01 +08:00 |
|
Yanchao Lu
|
d53cc2374b
|
[https://nvbugs/5433581][infra] Update install docs and CI script for SBSA deep_gemm workaround (#6607)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-04 23:36:38 -04:00 |
|
Enwei Zhu
|
899b74c357
|
[None][doc] Fix blog4 typo (#6612)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-05 10:20:37 +08:00 |
|
Leslie Fang
|
b9fe0fa7ec
|
[None][infra] Enable test of chunked prefill with logit post processor (#6483)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:46:07 -04:00 |
|
Leslie Fang
|
a60190836c
|
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:45:24 -04:00 |
|
Zhenhua Wang
|
59d91b8b94
|
[None][chore] add online help to build_wheel.py and fix a doc link (#6391)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
|
2025-08-04 13:14:55 +08:00 |
|
Zac Patel
|
18d1941083
|
[doc] Update perf_overview.md for release 0.21 (#6270)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
5913282e17
|
doc: update release notes (#6438)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
e1eca33dfc
|
doc: update release notes (#6324)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
3f47117870
|
doc: update known issues (#6247)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
Yiqing Yan
|
3f7abf87bc
|
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-03 11:18:59 +08:00 |
|
Kaiyu Xie
|
147ad69368
|
[None][doc] blog: Scaling Expert Parallelism in TensorRT-LLM (Part 2: Performance Status and Optimization) (#6547)
Signed-off-by: Kaiyu XIe <26294424+kaiyux@users.noreply.github.com>
|
2025-08-01 16:46:15 +08:00 |
|
Wanli Jiang
|
fcd5706615
|
doc: add bielik model to support-matrix (#6480)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-31 00:48:53 -04:00 |
|
Yechan Kim
|
83621e4b80
|
doc: update multimodal models on support-matrix.md (#6431)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-31 08:50:18 +08:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
Leslie Fang
|
d980928c96
|
[doc] update the doc of feature combination matrix (#6441)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-07-30 18:48:49 +08:00 |
|
nv-guomingz
|
7231134996
|
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-29 11:01:21 -04:00 |
|
Kaiyu Xie
|
e58afa510e
|
doc: Add README for wide EP (#6356)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-29 00:36:12 -04:00 |
|
nv-guomingz
|
49044733e1
|
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-28 11:38:30 -04:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
Simeng Liu
|
7bff341553
|
[doc] Add NGram tech blog (#6311)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-07-25 10:26:33 -07:00 |
|
Lizhi Zhou
|
a63a1ac7f9
|
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs (#6085)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-07-24 16:21:01 +08:00 |
|
nv-guomingz
|
31d3eff24b
|
doc: fix invalid links related with llm api example (#6317)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-24 00:46:51 -04:00 |
|
Kaiyu Xie
|
f08286c679
|
doc: Refactor documents and examples of disaggregated serving and wide ep (#6054)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-23 09:20:57 +08:00 |
|
Raayan Dhar
|
5234502717
|
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222)
Signed-off-by: raayandhar <rdhar@nvidia.com>
|
2025-07-22 11:28:23 -07:00 |
|
Yechan Kim
|
b85ab139f9
|
doc: add supported data modality and types on multimodal serve (#5988)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-22 14:32:41 +08:00 |
|
bhsueh_NV
|
24ce6b9517
|
[Doc][Qwen3] update qwen3 into support-matrix (#6161)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
QI JUN
|
a03c680581
|
add release notes for 0.21 release (#6049)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-22 12:48:00 +08:00 |
|
nv-guomingz
|
34dd071bd6
|
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation. (#6039)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
amirkl94
|
f4f2176cd5
|
chore: Port leftover 0.20 (#5907)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
nv-guomingz
|
b4c7e8c9a5
|
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-21 10:49:29 +08:00 |
|
wili
|
82d3587bb8
|
[refactor] Unify name of NGram speculative decoding (#5937)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-19 12:59:57 +08:00 |
|
Venky
|
22d4a8c48a
|
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-19 00:50:40 +08:00 |
|