Commit Graph

29 Commits

Author SHA1 Message Date
Kaiyu Xie
f08286c679
doc: Refactor documents and examples of disaggregated serving and wide ep (#6054)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-23 09:20:57 +08:00
Raayan Dhar
5234502717
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-22 11:28:23 -07:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Shi Xiaowei
f4e0425a7b
doc: update the link of the diagram (#5953)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-11 18:02:22 +09:00
Shi Xiaowei
37293e4dfd
blog: add qwen3 disagg perf metrics (#5822) 2025-07-11 16:41:45 +09:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs (#5796)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818) 2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Fanrong Li
ebadc13086
[doc] update mtp documents (#5387)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-21 16:05:52 +08:00
Shi Xiaowei
1e35be5840
doc: subsequent modifications of blog 5 (#5366)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:23:13 +08:00
Shi Xiaowei
9a53e58a58
blog: Disaggregated Serving in TensorRT-LLM (#5353)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:02:15 +08:00
Julien Demouth
bb79ba7c35
Edits for tech blog 4 (#5006)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-09 09:38:41 +08:00
juney-nvidia
a761cc2f8d
doc: refinement based on Julien's feedbacks (#4967)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-06 08:56:14 +08:00
Kaiyu Xie
5a5427f86e
blog: Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP) (#4958)
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-05 22:24:04 +08:00
juney-nvidia
49f2f1f8eb
Expose new tech blog about DSR1 throughput optimization to the main R… (#4803)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-30 20:44:12 +08:00
Tao Li @ NVIDIA
3b7120d60e
DeepSeek R1 throughut optimization tech blog for Blackwell GPUs (#4791)
Signed-off-by: Tao Li
2025-05-30 18:54:19 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Fanrong Li
862bde99b6
draft[doc]: add mtp tech blog (#4580)
* add mtp tech blog.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* update figure size.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* update the figure caption style and add some code/pr links.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix figure captions.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix figure size and perf data.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix based on comments

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* fix figure links.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-05-23 13:54:21 +08:00
Shi Xiaowei
a98e7ea26b
fix: replace the image links in the blog (#4489)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-20 22:39:25 +08:00
juney-nvidia
ddf01f6266
refine doc (#4422) 2025-05-19 06:06:22 +08:00
juney-nvidia
58e2d6ffa7
Refine doc (#4421) 2025-05-19 06:03:05 +08:00
juney-nvidia
ac610b394a
Refine doc (#4420) 2025-05-19 05:05:24 +08:00
Kefeng-Duan
f5b6d453aa
doc: DS r1 min latency blog (#4386)
* add best perf practice on DSR1

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* add ds-r1 min latency tech blog

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* rm redundant doc

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* refine table content

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* refine table content

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* relative path for images

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* refine precommit

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

* pr4280 is merged

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>

---------

Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-16 20:20:28 +08:00