Bo Li
|
a66eeab537
|
[TRTLLM-9805][feat] Skip Softmax Attention. (#9821)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2025-12-21 02:52:42 -05:00 |
|
tburt-nv
|
6147452158
|
[https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
|
2025-12-13 08:35:31 +08:00 |
|
xiweny
|
c076a02b38
|
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-16 09:56:18 +08:00 |
|
nv-guomingz
|
3549b68c1c
|
chroe:clean useless flag (#4567)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-05-23 07:05:15 +08:00 |
|
Perkz Zheng
|
1c5b0d6a13
|
[Feat] add chunked-attention kernels on Hopper (for llama4) (#4291)
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add mtp for fmha_v2 MLA kernels and add chunked-attention support for hopper fmha kernels
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
|
2025-05-19 09:57:10 -07:00 |
|
qsang-nv
|
0fd59d64ab
|
infra: open source fmha v2 kernels (#4185)
* add fmha repo
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix format
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix code style
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix header
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix header kernel_traits.h
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* add .gitignore file
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* add SLIDING_WINDOW_ATTENTION
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix style
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix format
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* update setup.py
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* update build_wheel.py
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
---------
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
|
2025-05-15 10:56:34 +08:00 |
|