Tian Zheng
cfebfbb505
[ https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell ( #10490 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
Perkz Zheng
71ccc07d2b
[None][feat] update trtllm-gen to support groupsTokensHeadsQ ( #10261 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-15 02:24:25 -05:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Perkz Zheng
064b67e40c
[ https://nvbugs/5727952 ][fix] a pdl bug in trtllm-gen fmha kernels ( #9913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Perkz Zheng
e34302986d
[ https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels ( #9863 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-10 01:47:03 -08:00
Perkz Zheng
992781dc7b
[None][feat] update trtllm-gen nvfp4 kernels with better performance ( #9510 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-03 21:35:49 +08:00
Perkz Zheng
22c1748b80
[TRTLLM-8816][feat] add optimized trtllm-gen attention kernels on sm103 ( #9081 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-13 12:41:07 +08:00
Perkz Zheng
222bc911cd
[None][feat] add swapsMmaAb sparseMla kernels ( #8913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-05 09:32:34 -08:00
Perkz Zheng
497a07021d
[None][update] optimized sparse mla kernels && fix unspecified cuda launch ( #8866 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-02 22:26:59 -08:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Perkz Zheng
0722717ec0
[None][fix] trtllm-gen regression in PR 8301 ( #8426 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-10-17 03:21:31 -07:00
Fanrong Li
1e0fbb776d
[TRTLLM-8536][feat] Update trtllm gen fmha kernels to support block sparse attention ( #8301 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-10-13 05:54:48 -07:00
Perkz Zheng
60101eb8a5
[None][fix] trtllm-gen cubins compiled with wrong arch. ( #7953 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-24 04:13:36 -07:00
Perkz Zheng
bb64e7462c
[None][fix] fix a bug with trtllm-gen kernels + attention sinks ( #7919 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-23 00:32:04 -07:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache ( #7612 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels ( #7597 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
xiweny
0fdc6c7278
[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 ( #7570 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:04:10 +08:00
Perkz Zheng
6037fe3716
[ https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels ( #6941 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 23:29:36 +08:00
Perkz Zheng
58f7783ea4
[ https://nvbugs/5394685 ][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA ( #6834 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 13:55:56 -07:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Perkz Zheng
1f292ff2a0
[ https://jirasw.nvidia.com/browse/TRTLLM-4645 ] support mutliCtasKvMode for high-throughput MLA kernels ( #5426 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-06-25 16:31:10 +08:00
Perkz Zheng
3d87770e15
[ https://nvbugspro.nvidia.com/bug/5295470 ] support headDim 256 for blackwell fmha kernels ( #5164 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-06-13 23:01:01 +08:00
Perkz Zheng
4d711be8f4
Feat: add sliding-window-attention generation-phase kernels on Blackwell ( #4564 )
...
* move cubins to LFS
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add sliding-window-attention generation-phase kernels on Blackwell
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* address comments
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-26 09:06:33 +08:00
Perkz Zheng
426f6fd2bc
Feat: add chunked-attention kernels on Blackwell ( #4394 )
...
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add chunked-attention kernels on blackwell
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
fix
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-21 10:16:46 +08:00
Perkz Zheng
3f29d2f006
Feat: support exporting softmax statistics and update the kernel-selection heuristic ( #4155 )
...
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* support exporting softmax statistics and update the kernel-selection heuristic
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-12 15:31:46 +08:00
Perkz Zheng
35c5e4f1c5
feat: add CGA reduction fmha kernels on Blackwell. ( #3763 )
...
* update cubins
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* add trtllm-gen kernels for eagle3 and also kernels with cga-reduction
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
* address the comments
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-04-29 10:43:54 +08:00
Perkz Zheng
e9df23f815
fix: [MLA] fix the bug with fp8 MLA kernels on Blackwell. ( #3008 )
...
* update cubins
* update error message
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-03-25 18:03:29 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00