Commit Graph

16 Commits

Author SHA1 Message Date
Yao Yao
0788c5d0d6
[perf] improve XQA-MLA perf (#5468)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-26 18:09:13 +08:00
qsang-nv
e9cd810071
keep sm90 headsize 128 cubins (#5320)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-26 12:14:01 +08:00
Yao Yao
908463a5f5
[feat]: improve performance of XQA-MLA for sm120 (#5087)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-18 14:19:22 +08:00
qsang-nv
faca19c2f0
update setup.py for special cases (#5227)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-17 16:41:07 +08:00
qsang-nv
5a01ba5260
use cu for fmha_v2 (#4694)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-15 18:40:44 +08:00
yunruis
30c5b4183a
refactoring: port customized kernels with public cutlass version (#5027)
Signed-off-by: yunruis 

Merge this to unblock others since the full CI has been run through
2025-06-13 16:19:31 +08:00
Jinyang Yuan
20d0649f19
[feat] Support XQA-based MLA on SM120 (#4858)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-06-06 22:32:49 +08:00
qsang-nv
180b91f957
update fmha_v2 (#4895)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-05 22:14:28 +08:00
Perkz Zheng
40a7161f4f
fix: fmha_v2 compilation (#4659)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-27 17:39:39 +08:00
qsang-nv
157fe62965
fix fmha v2 tests (#4661)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-05-27 09:47:01 +08:00
Yao Yao
ef763b0ddc
fix: rename some terms (#4534)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-05-23 23:23:49 +08:00
nv-guomingz
3549b68c1c
chroe:clean useless flag (#4567)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-23 07:05:15 +08:00
Perkz Zheng
6a35c599ef
Clean: fmha codes (#4496)
clean codes

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-21 11:45:47 +08:00
Perkz Zheng
1c5b0d6a13
[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291)
* update cubins

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

* add mtp for fmha_v2 MLA kernels and add chunked-attention support for hopper fmha kernels

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-05-19 09:57:10 -07:00
qsang-nv
0fd59d64ab
infra: open source fmha v2 kernels (#4185)
* add fmha repo

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix format

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix code style

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix header

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix header kernel_traits.h

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* add .gitignore file

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* add SLIDING_WINDOW_ATTENTION

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix style

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* fix format

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* update setup.py

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

* update build_wheel.py

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>

---------

Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
2025-05-15 10:56:34 +08:00
Ming Wei
ed887940d4
infra: open source XQA kernels (#3762)
Replace libtensorrt_llm_nvrtc_wrapper.so with its source code, which
consists of two parts:

1. NVRTC glue code
2. XQA kernel code

During TensorRT-LLM build, XQA kernel code is embedded as C++ arries via
gen_cpp_header.py and passed to NVRTC for JIT compilation.

Signed-off-by: Ming Wei <2345434+ming-wei@users.noreply.github.com>
2025-04-30 18:05:15 +08:00