Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs ( #7375 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula ( #7367 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
yunruis
c4f823319b
[None][doc] add adp balance blog ( #7213 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM ( #7260 )
...
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. ( #7241 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image ( #7101 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
Bo Li
8b05b5d801
[None][doc] Update gpt oss doc ( #6954 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-18 01:27:30 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context ( #6929 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Andrew Chen
4ecda91ecc
[ https://nvbugs/5423962 ][fix] Address broken links ( #6531 )
2025-08-07 16:00:05 -04:00
Guoming Zhang
3036d49071
[None][doc] Unify the tech blogs naming. ( #6649 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 01:45:40 -04:00
Farshad Ghodsian
6af1514dc3
[None][doc] Adding GPT-OSS Deployment Guide documentation ( #6637 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-05 19:19:48 +02:00
Enwei Zhu
899b74c357
[None][doc] Fix blog4 typo ( #6612 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-05 10:20:37 +08:00
Kaiyu Xie
147ad69368
[None][doc] blog: Scaling Expert Parallelism in TensorRT-LLM (Part 2: Performance Status and Optimization) ( #6547 )
...
Signed-off-by: Kaiyu XIe <26294424+kaiyux@users.noreply.github.com>
2025-08-01 16:46:15 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… ( #6428 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Kaiyu Xie
e58afa510e
doc: Add README for wide EP ( #6356 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-29 00:36:12 -04:00
nv-guomingz
49044733e1
chore: delete useless gitkeep files. ( #6400 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-28 11:38:30 -04:00
Simeng Liu
7bff341553
[doc] Add NGram tech blog ( #6311 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-25 10:26:33 -07:00
Kaiyu Xie
f08286c679
doc: Refactor documents and examples of disaggregated serving and wide ep ( #6054 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-23 09:20:57 +08:00
Raayan Dhar
5234502717
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency ( #6222 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-22 11:28:23 -07:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… ( #6150 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Shi Xiaowei
f4e0425a7b
doc: update the link of the diagram ( #5953 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-11 18:02:22 +09:00
Shi Xiaowei
37293e4dfd
blog: add qwen3 disagg perf metrics ( #5822 )
2025-07-11 16:41:45 +09:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide ( #5810 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs ( #5796 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config ( #5680 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Kaiyu Xie
ab488a5a5d
doc: Fix outdated config in DeepSeek best perf practice doc ( #5638 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
Kaiyu Xie
2ce200fbbb
doc: Minor update to DeepSeek R1 best practice ( #5600 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-30 15:49:06 +08:00
Fanrong Li
ebadc13086
[doc] update mtp documents ( #5387 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-21 16:05:52 +08:00
Shi Xiaowei
1e35be5840
doc: subsequent modifications of blog 5 ( #5366 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:23:13 +08:00
Shi Xiaowei
9a53e58a58
blog: Disaggregated Serving in TensorRT-LLM ( #5353 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:02:15 +08:00
Tao Li @ NVIDIA
03f1a6a3d8
Update DeepSeek R1 perf numbers to latest release/0.20 results ( #5235 )
2025-06-16 17:42:13 +08:00
Julien Demouth
bb79ba7c35
Edits for tech blog 4 ( #5006 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-09 09:38:41 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
juney-nvidia
a761cc2f8d
doc: refinement based on Julien's feedbacks ( #4967 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-06 08:56:14 +08:00
Kaiyu Xie
5a5427f86e
blog: Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP) ( #4958 )
...
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-05 22:24:04 +08:00
juney-nvidia
49f2f1f8eb
Expose new tech blog about DSR1 throughput optimization to the main R… ( #4803 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-30 20:44:12 +08:00
Tao Li @ NVIDIA
3b7120d60e
DeepSeek R1 throughut optimization tech blog for Blackwell GPUs ( #4791 )
...
Signed-off-by: Tao Li
2025-05-30 18:54:19 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Fanrong Li
862bde99b6
draft[doc]: add mtp tech blog ( #4580 )
...
* add mtp tech blog.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update figure size.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update the figure caption style and add some code/pr links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure captions.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure size and perf data.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix based on comments
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
* fix figure links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-05-23 13:54:21 +08:00
Shi Xiaowei
a98e7ea26b
fix: replace the image links in the blog ( #4489 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-20 22:39:25 +08:00
juney-nvidia
ddf01f6266
refine doc ( #4422 )
2025-05-19 06:06:22 +08:00