Commit Graph

23 Commits

Author SHA1 Message Date
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision (#9541)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Jhao-Ting Chen
0a09465089
[https://nvbugs/5567586][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
Kanghwan
41e5870a70
[#8476][chore] Update license (#8807)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-19 15:05:25 -08:00
qsang-nv
0f42a24f45
[None][feat] Fix attention sink load in xqa (#8836)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-11-03 09:39:45 +08:00
qsang-nv
07edac2818
[None][feat] Add vLLM KV Pool support for XQA mla kernel (#8560)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-10-22 14:12:57 +08:00
Jhao-Ting Chen
220dc01372
[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-23 14:56:17 -07:00
Pengbo Wang
08cc7a041f
[https://nvbugs/5355128][fix] Add missing wgmma intrinsic for starcoder (#7643)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:38:58 +08:00
Yao Yao
c1aa7f31d9
[None][fix] Fix a numerical stability issue for XQA with spec dec (#7114)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-09-03 20:40:05 -04:00
Yao Yao
cbcea33279
[fix]: use safeInitRowMax instead of fp32_lowest to avoid NaN (#7087)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-08-20 22:12:21 -07:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Ransiki
19b7524ff6
[None][feat] Add vLLM KV Pool support for XQA kernel (#6013)
Signed-off-by: Ransiki Zhang <ransikiz@nvidia.com>
2025-08-06 09:29:37 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Bruce-Lee-LY
8c82ee2803
[fix] xqa precision for fp16/bf16 kv cache (#6573)
Signed-off-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
Co-authored-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
2025-08-04 14:34:20 +08:00
qsang-nv
8ef8e73002
update spec_dec (#6079)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-07-16 17:50:43 +08:00
杨凯旋
61c5a53642
[#5403][perf] Conditionally enable SWAP AB for speculative decoding (#5404)
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-07-01 18:32:37 +08:00
Yao Yao
0788c5d0d6
[perf] improve XQA-MLA perf (#5468)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-26 18:09:13 +08:00
Yao Yao
908463a5f5
[feat]: improve performance of XQA-MLA for sm120 (#5087)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-18 14:19:22 +08:00
yunruis
30c5b4183a
refactoring: port customized kernels with public cutlass version (#5027)
Signed-off-by: yunruis 

Merge this to unblock others since the full CI has been run through
2025-06-13 16:19:31 +08:00
Jinyang Yuan
20d0649f19
[feat] Support XQA-based MLA on SM120 (#4858)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-06-06 22:32:49 +08:00
Yao Yao
ef763b0ddc
fix: rename some terms (#4534)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-05-23 23:23:49 +08:00
Ming Wei
ed887940d4
infra: open source XQA kernels (#3762)
Replace libtensorrt_llm_nvrtc_wrapper.so with its source code, which
consists of two parts:

1. NVRTC glue code
2. XQA kernel code

During TensorRT-LLM build, XQA kernel code is embedded as C++ arries via
gen_cpp_header.py and passed to NVRTC for JIT compilation.

Signed-off-by: Ming Wei <2345434+ming-wei@users.noreply.github.com>
2025-04-30 18:05:15 +08:00