Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Jhao-Ting Chen
0a09465089
[ https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model ( #8383 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
Kanghwan
41e5870a70
[ #8476 ][chore] Update license ( #8807 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-19 15:05:25 -08:00
Jhao-Ting Chen
220dc01372
[None][feat] support JIT mha.cu for SPEC_DEC in runtime ( #6078 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-23 14:56:17 -07:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Ransiki
19b7524ff6
[None][feat] Add vLLM KV Pool support for XQA kernel ( #6013 )
...
Signed-off-by: Ransiki Zhang <ransikiz@nvidia.com>
2025-08-06 09:29:37 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
qsang-nv
8ef8e73002
update spec_dec ( #6079 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-07-16 17:50:43 +08:00
杨凯旋
61c5a53642
[ #5403 ][perf] Conditionally enable SWAP AB for speculative decoding ( #5404 )
...
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-07-01 18:32:37 +08:00
Jinyang Yuan
20d0649f19
[feat] Support XQA-based MLA on SM120 ( #4858 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-06-06 22:32:49 +08:00
Yao Yao
ef763b0ddc
fix: rename some terms ( #4534 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-05-23 23:23:49 +08:00
Ming Wei
ed887940d4
infra: open source XQA kernels ( #3762 )
...
Replace libtensorrt_llm_nvrtc_wrapper.so with its source code, which
consists of two parts:
1. NVRTC glue code
2. XQA kernel code
During TensorRT-LLM build, XQA kernel code is embedded as C++ arries via
gen_cpp_header.py and passed to NVRTC for JIT compilation.
Signed-off-by: Ming Wei <2345434+ming-wei@users.noreply.github.com>
2025-04-30 18:05:15 +08:00