Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 ( #5524 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
Alessio Netti
7e681fbe52
[chore] Allow configuring linking of NVRTC wrapper ( #5189 )
...
Signed-off-by: Alessio Netti <netti.alessio@gmail.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-26 07:26:10 +02:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Jinyang Yuan
20d0649f19
[feat] Support XQA-based MLA on SM120 ( #4858 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-06-06 22:32:49 +08:00
Ming Wei
ed887940d4
infra: open source XQA kernels ( #3762 )
...
Replace libtensorrt_llm_nvrtc_wrapper.so with its source code, which
consists of two parts:
1. NVRTC glue code
2. XQA kernel code
During TensorRT-LLM build, XQA kernel code is embedded as C++ arries via
gen_cpp_header.py and passed to NVRTC for JIT compilation.
Signed-off-by: Ming Wei <2345434+ming-wei@users.noreply.github.com>
2025-04-30 18:05:15 +08:00
Kaiyu Xie
258ae9c58c
Revert "infra: move nvrtc_wrapper to conan ( #3282 )" ( #3573 )
...
This reverts commit c0dd6cbce0 .
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-15 22:45:13 +08:00
tburt-nv
c0dd6cbce0
infra: move nvrtc_wrapper to conan ( #3282 )
...
* add pip scripts dir to path
* move nvrtc_wrapper to conan
* support building nvrtc wrapper from source
---------
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-04-15 05:31:01 +08:00
Yuan Tong
a139eae425
chore: Stabilize ABI boundary for internal kernel library ( #3117 )
...
chore: Stabilize ABI boundary for internal kernel library
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-04-11 15:07:50 +08:00
Yao Yao
3545d59635
Support speculative decoding with Hopper XQA ( #3269 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-04-07 17:14:34 +08:00
nv-guomingz
dc0463b0e2
doc:add version.txt for internal cutlass library and nvrtc_wrapper so files ( #3030 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 23:44:21 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
be17881062
Update TensorRT-LLM ( #2582 )
2024-12-16 21:50:47 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM ( #2532 )
...
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM ( #2502 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM ( #2460 )
2024-11-19 18:30:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM ( #2436 )
2024-11-12 15:27:49 +08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM ( #2413 )
2024-11-05 16:27:06 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM ( #2389 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Kaiyu Xie
1730a587d8
Update TensorRT-LLM ( #2363 )
...
* Update TensorRT-LLM
---------
Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
2024-10-22 20:27:35 +08:00
Kaiyu Xie
75057cd036
Update TensorRT-LLM ( #2333 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae>
Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>
2024-10-15 15:28:40 +08:00
Kaiyu Xie
8681b3a4c0
open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 ( #2297 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Bhuvanesh Sridharan <bhuvanesh.sridharan@sprinklr.com>
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
2024-10-08 12:19:19 +02:00
Dan Blanaru
48686bca3a
open source 7f370deb0090d885d7518c2b146399ba3933c004 ( #2273 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
2024-09-30 13:51:19 +02:00
Kaiyu Xie
40274aac39
Bump version to 0.14.0.dev2024092401 ( #2258 )
2024-09-26 10:26:16 +08:00
Kaiyu Xie
e153372759
Update TensorRT-LLM ( #2253 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Ivan Sorokin <isorokin@nvidia.com>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-09-24 17:27:31 +02:00
Kaiyu Xie
a65dba7aaf
Bump version to 0.14.0.dev2024091700 ( #2234 )
2024-09-18 08:58:36 +08:00
Kaiyu Xie
fe7dc6ad4e
Update TensorRT-LLM ( #2230 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Yi Wang <yi.wang.2005@gmail.com>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-09-17 14:39:09 +08:00
Kaiyu Xie
31ac30e928
Update TensorRT-LLM ( #2215 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
2024-09-10 18:21:22 +08:00
Kaiyu Xie
78f5c2936b
Update TensorRT-LLM ( #2184 )
2024-09-03 12:14:23 +02:00
石晓伟
b8fc6633ba
Update TensorRT-LLM ( #2156 )
...
Co-authored-by: Bruno Magalhaes <bruno.magalhaes@synthesia.io>
2024-08-27 18:20:59 +08:00
石晓伟
32ed92e449
Update TensorRT-LLM
...
Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
2024-08-20 18:55:15 +08:00
Kaiyu Xie
74b324f667
Update TensorRT-LLM ( #2110 )
2024-08-13 22:34:33 +08:00
Kaiyu Xie
be9cd719f7
Update TensorRT-LLM ( #2094 )
...
* Update TensorRT-LLM
---------
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Tayef Shah <tayefshah@gmail.com>
Co-authored-by: lfz941 <linfanzai941@gmail.com>
2024-08-07 16:44:43 +08:00
Kaiyu Xie
a681853d38
Update TensorRT-LLM ( #2053 )
2024-07-30 21:25:01 +08:00
Kaiyu Xie
93293aa46d
open source 315e9f5ccd286e906d4c0d402fefbf2f69a1febe ( #2033 )
2024-07-26 16:19:24 +08:00
Kaiyu Xie
5fa9436e17
Update TensorRT-LLM ( #2016 )
2024-07-24 19:50:28 +08:00
dongxuy04
5f26e44ead
open source 3706e7395b9b58994412617992727c8ff2d14c9f ( #2010 )
...
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-07-24 05:48:06 +08:00
Kaiyu Xie
bca9a33b02
Update TensorRT-LLM ( #2008 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Timur Abishev <abishev.timur@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai>
Co-authored-by: hattizai <hattizai@gmail.com>
2024-07-23 23:05:09 +08:00
Kaiyu Xie
2d234357c6
Update TensorRT-LLM ( #1954 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Altair-Alpha <62340011+Altair-Alpha@users.noreply.github.com>
2024-07-16 15:30:25 +08:00
Kaiyu Xie
a96cccafcf
Update TensorRT-LLM ( #1918 )
2024-07-09 14:42:22 +08:00
Kaiyu Xie
9dbc5b38ba
Update TensorRT-LLM ( #1891 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Marks101 <markus.schnoes@gmx.de>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-07-04 14:37:19 +08:00
Kaiyu Xie
9691e12bce
Update TensorRT-LLM ( #1835 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2024-06-25 21:10:30 +08:00