Yanchao Lu
|
86779213db
|
[Docs] - Add date and commit info (#4448) (#4752)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-05-30 15:58:49 +08:00 |
|
Yanchao Lu
|
942ac5c638
|
[Docs] - Reapply #4220 (#4434)
* [Docs] - Reapply #4220
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
* Update docs/source/_static/switcher.json
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-05-19 22:52:37 +08:00 |
|
Daniel Cámpora
|
df19430629
|
chore: Mass Integration 0.19 (#4255)
* fix: Fix/fused moe 0.19 (#3799)
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772)
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839)
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884)
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975)
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006)
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042)
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060)
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
|
2025-05-16 10:53:25 +02:00 |
|
Yiteng Niu
|
31a2e2d08d
|
doc: update switcher.json config (#4220)
|
2025-05-12 20:40:55 +08:00 |
|
Kaiyu Xie
|
d7472231f9
|
TRTLLM-4875 feat: Add version switcher to doc (#3846)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-04-26 05:42:46 +08:00 |
|
Pengyun Lin
|
1899e71364
|
doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc (#3407)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-04-16 00:11:58 +08:00 |
|
Kaiyu Xie
|
2631f21089
|
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-03-23 16:39:35 +08:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
e88da961c5
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
Kaiyu Xie
|
c629546ce4
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
Kaiyu Xie
|
f14d1d433c
|
Update TensorRT-LLM (#2389)
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
|
2024-10-29 22:24:38 +08:00 |
|
Kaiyu Xie
|
1730a587d8
|
Update TensorRT-LLM (#2363)
* Update TensorRT-LLM
---------
Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
|
2024-10-22 20:27:35 +08:00 |
|
Kaiyu Xie
|
31ac30e928
|
Update TensorRT-LLM (#2215)
* Update TensorRT-LLM
---------
Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
|
2024-09-10 18:21:22 +08:00 |
|
Kaiyu Xie
|
78f5c2936b
|
Update TensorRT-LLM (#2184)
|
2024-09-03 12:14:23 +02:00 |
|
石晓伟
|
32ed92e449
|
Update TensorRT-LLM
Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
|
2024-08-20 18:55:15 +08:00 |
|
Kaiyu Xie
|
b777bd6475
|
Update TensorRT-LLM (#1725)
* Update TensorRT-LLM
---------
Co-authored-by: RunningLeon <mnsheng@yeah.net>
Co-authored-by: Tlntin <TlntinDeng01@Gmail.com>
Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com>
Co-authored-by: Pham Van Ngoan <ngoanpham1196@gmail.com>
Co-authored-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Tushar Goel <tushar.goel.ml@gmail.com>
Co-authored-by: Mati <132419219+matichon-vultureprime@users.noreply.github.com>
|
2024-06-04 20:26:32 +08:00 |
|
Kaiyu Xie
|
66ca3378c6
|
Update TensorRT-LLM (#1315)
|
2024-03-19 17:36:42 +08:00 |
|
Kaiyu Xie
|
75b6210ff4
|
Kaiyu/update main (#5)
* Update
* Update
|
2023-10-18 22:38:53 +08:00 |
|
Kevin Xie
|
027cd518e3
|
Update
|
2023-10-10 23:22:17 -07:00 |
|
Kevin Xie
|
6e9e318e91
|
Update code
|
2023-09-28 09:00:05 -07:00 |
|