Zac Patel
3bf405f6c3
[doc] Update perf_overview.md for release 0.21 ( #6270 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-31 12:13:38 +08:00
amirkl94
8429c8b139
chore: Port leftover 0.20 ( #5907 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-10 13:48:12 +02:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow ( #5130 )
2025-06-15 18:54:04 +03:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Kaiyu Xie
a43914619f
fix: wrong argument name enable_overlap_scheduler ( #4433 )
...
Fix wrong argument
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-19 15:02:22 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 ( #4255 )
...
* fix: Fix/fused moe 0.19 (#3799 )
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772 )
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495 ][fix] CUDA Graph padding with overlap scheduler (#3839 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897 )
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939 )
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884 )
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931 )
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148 ][fix] Attention DP with overlap scheduler (#3975 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006 )
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042 )
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060 )
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166 )
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
Kaiyu Xie
d1fa80dee3
doc: TRTLLM-4797 Update perf-analysis.md ( #4100 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-08 17:24:44 +08:00
Kaiyu Xie
dfbcb543ce
doc: fix path after examples migration ( #3814 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-24 02:36:45 +08:00
rakib-hasan
ff3b741045
feat: adding multimodal (only image for now) support in trtllm-bench ( #3490 )
...
* feat: adding multimodal (only image for now) support in trtllm-bench
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* fix: add in load_dataset() calls to maintain the v2.19.2 behavior
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* re-adding prompt_token_ids and using that for prompt_len
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* updating the datasets version in examples as well
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* api changes are not needed
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* moving datasets requirement and removing a missed api change
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing review comments
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* refactoring the quickstart example
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
---------
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-18 07:06:16 +08:00
Bo Li
11b0091863
docs: Update perf-benchmarking doc on GPU configuration for consistent benchmarking. ( #3458 )
...
Signed-off-by: Bo Li <bobboli0202@gmail.com>
2025-04-11 17:21:27 +02:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM ( #2532 )
...
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM ( #2502 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM ( #2436 )
2024-11-12 15:27:49 +08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM ( #2413 )
2024-11-05 16:27:06 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM ( #2389 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Kaiyu Xie
75057cd036
Update TensorRT-LLM ( #2333 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae>
Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>
2024-10-15 15:28:40 +08:00
Dan Blanaru
48686bca3a
open source 7f370deb0090d885d7518c2b146399ba3933c004 ( #2273 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
2024-09-30 13:51:19 +02:00
Kaiyu Xie
31ac30e928
Update TensorRT-LLM ( #2215 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
2024-09-10 18:21:22 +08:00
石晓伟
32ed92e449
Update TensorRT-LLM
...
Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
2024-08-20 18:55:15 +08:00
Kaiyu Xie
be9cd719f7
Update TensorRT-LLM ( #2094 )
...
* Update TensorRT-LLM
---------
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Tayef Shah <tayefshah@gmail.com>
Co-authored-by: lfz941 <linfanzai941@gmail.com>
2024-08-07 16:44:43 +08:00
Kaiyu Xie
bca9a33b02
Update TensorRT-LLM ( #2008 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Timur Abishev <abishev.timur@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai>
Co-authored-by: hattizai <hattizai@gmail.com>
2024-07-23 23:05:09 +08:00
Kaiyu Xie
9dbc5b38ba
Update TensorRT-LLM ( #1891 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Marks101 <markus.schnoes@gmx.de>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-07-04 14:37:19 +08:00
石晓伟
2a115dae84
Update TensorRT-LLM ( #1793 )
...
Co-authored-by: DreamGenX <x@dreamgen.com>
Co-authored-by: Ace-RR <78812427+Ace-RR@users.noreply.github.com>
Co-authored-by: bprus <39293131+bprus@users.noreply.github.com>
Co-authored-by: janpetrov <janpetrov@icloud.com>
2024-06-18 18:18:23 +08:00
Kaiyu Xie
db4edea1e1
Update TensorRT-LLM ( #1763 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Kota Tsuyuzaki <bloodeagle40234@gmail.com>
Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
2024-06-11 16:59:02 +08:00
Kaiyu Xie
b777bd6475
Update TensorRT-LLM ( #1725 )
...
* Update TensorRT-LLM
---------
Co-authored-by: RunningLeon <mnsheng@yeah.net>
Co-authored-by: Tlntin <TlntinDeng01@Gmail.com>
Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com>
Co-authored-by: Pham Van Ngoan <ngoanpham1196@gmail.com>
Co-authored-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Tushar Goel <tushar.goel.ml@gmail.com>
Co-authored-by: Mati <132419219+matichon-vultureprime@users.noreply.github.com>
2024-06-04 20:26:32 +08:00
Kaiyu Xie
f430a4b447
Update TensorRT-LLM ( #1688 )
...
* Update TensorRT-LLM
---------
Co-authored-by: IbrahimAmin <ibrahimamin532@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com>
Co-authored-by: CoderHam <hemant@cohere.com>
Co-authored-by: Konstantin Lopuhin <kostia.lopuhin@gmail.com>
2024-05-28 20:07:49 +08:00
Kaiyu Xie
5d8ca2faf7
Update TensorRT-LLM ( #1639 )
...
* Update TensorRT-LLM
---------
Co-authored-by: vonjackustc <fga@mail.ustc.edu.cn>
2024-05-21 17:51:02 +08:00
Samriddha Sinha
309ab33db0
Update dead links in perf-best-practices.md ( #1545 )
2024-05-20 18:05:14 +08:00
Kaiyu Xie
bf0a5afc92
Update TensorRT-LLM ( #1598 )
...
* Update TensorRT-LLM
2024-05-14 16:43:41 +08:00
Kaiyu Xie
66ef1df492
Update TensorRT-LLM ( #1492 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Loki <lokravi@amazon.com>
2024-04-24 14:44:22 +08:00