Zac Patel
3bf405f6c3
[doc] Update perf_overview.md for release 0.21 ( #6270 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-31 12:13:38 +08:00
QI JUN
418892e270
doc: update release notes ( #6438 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-29 04:22:28 -04:00
QI JUN
44f6db8c1c
doc: update release notes ( #6324 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-28 16:05:50 +08:00
QI JUN
1209001bac
doc: update known issues ( #6247 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-22 17:27:31 +08:00
bhsueh_NV
9323de6e37
[Doc][Qwen3] update qwen3 into support-matrix ( #6161 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-18 11:23:30 +08:00
QI JUN
f6db521e95
add release notes for 0.21 release ( #6049 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-16 16:54:14 +08:00
nv-guomingz
63f4a7ad32
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation. ( #6039 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 13:33:03 +08:00
amirkl94
8429c8b139
chore: Port leftover 0.20 ( #5907 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-10 13:48:12 +02:00
Yi Zhang
39ad6023b7
doc: Update gb200 doc ( #5840 )
...
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-08 20:22:06 +09:00
Kaiyu Xie
682b164b9b
doc: Fix outdated config in DeepSeek best perf practice doc ( #5638 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 04:58:50 -04:00
ixlmar
abb7357f25
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions ( #5490 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-27 07:09:41 -07:00
Kaiyu Xie
30a2a8b81c
doc: Fix benchmark cmd in disagg scripts ( #5516 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-26 17:23:24 +08:00
Martin Marciniszyn Mehringer
fc64f139e4
Fix permission for local user issues in NGC docker container. ( #5373 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-06-25 14:10:20 +02:00
Martin Marciniszyn Mehringer
ebc6dbcb0b
doc: cherry pick #5334 ( #5368 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-06-19 20:03:59 +08:00
Xianjie Qiao
857108aeca
Add disagg slurm scripts ( #5243 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-06-18 23:17:55 +08:00
Yan Chunwei
724e495254
chore: partition LLM class into TorchLLM and TrtLLM ( #4900 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-18 14:01:25 +08:00
Emma Qiao
ff32caf4d7
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 ( #4885 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 23:48:34 +08:00
Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline ( #5245 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
Tao Li @ NVIDIA
03f1a6a3d8
Update DeepSeek R1 perf numbers to latest release/0.20 results ( #5235 )
2025-06-16 17:42:13 +08:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow ( #5130 )
2025-06-15 18:54:04 +03:00
yunruis
e96d6863d8
add doc for open-sourced cutlass kernels ( #5194 )
...
Signed-off-by: yunruis
2025-06-13 18:51:27 +08:00
Daniel Cámpora
22281cfc55
doc: Added documentation for enable_trtllm_sampler. ( #4990 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
2025-06-12 18:34:15 +08:00
Venky
59c9588e9a
enh(doc): Add ci-overview in docs/source/reference/ ( #5137 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-06-12 17:48:13 +08:00
nv-guomingz
b563696dee
doc:fix invalid links for trtllm-serve doc ( #5145 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 16:17:32 +08:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce ( #4635 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
Linda
50f576172b
doc: add info about stop words appearing in output ( #4956 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-06-10 22:38:33 +02:00
Julien Demouth
bb79ba7c35
Edits for tech blog 4 ( #5006 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-09 09:38:41 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Bo Li
f414a079ad
chore: Change the type annotations of input_ids and position_ids to int32. ( #4632 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-07 16:10:47 +08:00
juney-nvidia
a761cc2f8d
doc: refinement based on Julien's feedbacks ( #4967 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-06 08:56:14 +08:00
Kaiyu Xie
5a5427f86e
blog: Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP) ( #4958 )
...
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-05 22:24:04 +08:00
juney-nvidia
49f2f1f8eb
Expose new tech blog about DSR1 throughput optimization to the main R… ( #4803 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-30 20:44:12 +08:00
Tao Li @ NVIDIA
3b7120d60e
DeepSeek R1 throughut optimization tech blog for Blackwell GPUs ( #4791 )
...
Signed-off-by: Tao Li
2025-05-30 18:54:19 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
Fanrong Li
862bde99b6
draft[doc]: add mtp tech blog ( #4580 )
...
* add mtp tech blog.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update figure size.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update the figure caption style and add some code/pr links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure captions.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure size and perf data.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix based on comments
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
* fix figure links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-05-23 13:54:21 +08:00
Shi Xiaowei
a98e7ea26b
fix: replace the image links in the blog ( #4489 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-20 22:39:25 +08:00
Yanchao Lu
bc6a69e4cb
[Docs] - Add date and commit info ( #4448 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-20 18:53:44 +08:00
kanghui0204
6f3922f318
feat: Low Precision Allreduce for PCIe based GPU ( #4344 )
...
This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process.
Signed-off-by: Hui Kang <hkang@nvidia.com>
Co-authored-by: Hui Kang <hkang@nvidia.com>
2025-05-20 06:53:46 +08:00
Yanchao Lu
942ac5c638
[Docs] - Reapply #4220 ( #4434 )
...
* [Docs] - Reapply #4220
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
* Update docs/source/_static/switcher.json
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 22:52:37 +08:00
Kaiyu Xie
a43914619f
fix: wrong argument name enable_overlap_scheduler ( #4433 )
...
Fix wrong argument
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-19 15:02:22 +08:00
juney-nvidia
ddf01f6266
refine doc ( #4422 )
2025-05-19 06:06:22 +08:00
juney-nvidia
58e2d6ffa7
Refine doc ( #4421 )
2025-05-19 06:03:05 +08:00
juney-nvidia
ac610b394a
Refine doc ( #4420 )
2025-05-19 05:05:24 +08:00
Yanchao Lu
0d7269e2a7
[Infra][Docs] - Some clean-up for the CI pipeline and docs ( #4419 )
...
* [Docs] - Some clean-up for the docs
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
* [Infra] - Some clean-up for the CI pipeline
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 00:07:45 +08:00
Kefeng-Duan
f5b6d453aa
doc: DS r1 min latency blog ( #4386 )
...
* add best perf practice on DSR1
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* add ds-r1 min latency tech blog
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* rm redundant doc
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* relative path for images
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine precommit
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* pr4280 is merged
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-16 20:20:28 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 ( #4255 )
...
* fix: Fix/fused moe 0.19 (#3799 )
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772 )
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495 ][fix] CUDA Graph padding with overlap scheduler (#3839 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897 )
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939 )
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884 )
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931 )
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148 ][fix] Attention DP with overlap scheduler (#3975 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006 )
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042 )
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060 )
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166 )
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
Yechan Kim
c6e2111f4e
feat: enhance trtllm serve multimodal ( #3757 )
...
* feat: enhance trtllm serve multimodal
1. made the load_image and load_video asynchronous
2. add image_encoded input support to be compatible with genai-perf
3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix bandit
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming uils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming for test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* genai perf command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* refactor chat_utils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* stress test genai-perf command
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-05-15 16:16:31 -07:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
QI JUN
498ce8a056
Revert "feat: Low Precision Allreduce for PCIe based GPU" ( #4340 )
...
Revert "feat: Low Precision Allreduce for PCIe based GPU (#3851 )"
This reverts commit 5e634dd1bd .
2025-05-15 09:52:39 +08:00