Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Jhao-Ting Chen
0a09465089
[ https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model ( #8383 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile ( #8986 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
Kanghwan
41e5870a70
[ #8476 ][chore] Update license ( #8807 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-19 15:05:25 -08:00
qsang-nv
0f42a24f45
[None][feat] Fix attention sink load in xqa ( #8836 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-11-03 09:39:45 +08:00
qsang-nv
07edac2818
[None][feat] Add vLLM KV Pool support for XQA mla kernel ( #8560 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-10-22 14:12:57 +08:00
Jhao-Ting Chen
220dc01372
[None][feat] support JIT mha.cu for SPEC_DEC in runtime ( #6078 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-23 14:56:17 -07:00
Pengbo Wang
08cc7a041f
[ https://nvbugs/5355128 ][fix] Add missing wgmma intrinsic for starcoder ( #7643 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:38:58 +08:00
Yao Yao
c1aa7f31d9
[None][fix] Fix a numerical stability issue for XQA with spec dec ( #7114 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-09-03 20:40:05 -04:00
Yao Yao
cbcea33279
[fix]: use safeInitRowMax instead of fp32_lowest to avoid NaN ( #7087 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-08-20 22:12:21 -07:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Ransiki
19b7524ff6
[None][feat] Add vLLM KV Pool support for XQA kernel ( #6013 )
...
Signed-off-by: Ransiki Zhang <ransikiz@nvidia.com>
2025-08-06 09:29:37 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Bruce-Lee-LY
8c82ee2803
[fix] xqa precision for fp16/bf16 kv cache ( #6573 )
...
Signed-off-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
Co-authored-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
2025-08-04 14:34:20 +08:00
qsang-nv
8ef8e73002
update spec_dec ( #6079 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-07-16 17:50:43 +08:00
杨凯旋
61c5a53642
[ #5403 ][perf] Conditionally enable SWAP AB for speculative decoding ( #5404 )
...
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-07-01 18:32:37 +08:00
Yao Yao
0788c5d0d6
[perf] improve XQA-MLA perf ( #5468 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-26 18:09:13 +08:00
Yao Yao
908463a5f5
[feat]: improve performance of XQA-MLA for sm120 ( #5087 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-18 14:19:22 +08:00
yunruis
30c5b4183a
refactoring: port customized kernels with public cutlass version ( #5027 )
...
Signed-off-by: yunruis
Merge this to unblock others since the full CI has been run through
2025-06-13 16:19:31 +08:00
Jinyang Yuan
20d0649f19
[feat] Support XQA-based MLA on SM120 ( #4858 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-06-06 22:32:49 +08:00
Yao Yao
ef763b0ddc
fix: rename some terms ( #4534 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-05-23 23:23:49 +08:00
Ming Wei
ed887940d4
infra: open source XQA kernels ( #3762 )
...
Replace libtensorrt_llm_nvrtc_wrapper.so with its source code, which
consists of two parts:
1. NVRTC glue code
2. XQA kernel code
During TensorRT-LLM build, XQA kernel code is embedded as C++ arries via
gen_cpp_header.py and passed to NVRTC for JIT compilation.
Signed-off-by: Ming Wei <2345434+ming-wei@users.noreply.github.com>
2025-04-30 18:05:15 +08:00