..
cubin
Feat: add sliding-window-attention generation-phase kernels on Blackwell ( #4564 )
2025-05-26 09:06:33 +08:00
decoderXQAImplJIT
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
instantiation
Update TensorRT-LLM ( #2502 )
2024-11-26 16:51:34 +08:00
CMakeLists.txt
infra: open source XQA kernels ( #3762 )
2025-04-30 18:05:15 +08:00
copy_cu.py
Update TensorRT-LLM ( #787 )
2024-01-02 17:54:32 +08:00
decoderMaskedMultiheadAttentionLaunch.h
[ https://nvbugspro.nvidia.com/bug/5300080 ] Fix the bug of setting attention_chunk_size and enable chunked-attention in the generation-phase by default ( #4693 )
2025-06-03 19:02:57 -04:00
decoderMaskedMultiheadAttentionTemplate.h
[ https://nvbugspro.nvidia.com/bug/5300080 ] Fix the bug of setting attention_chunk_size and enable chunked-attention in the generation-phase by default ( #4693 )
2025-06-03 19:02:57 -04:00
decoderXQAConstants.h
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
decoderXQAImpl.cpp
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
decoderXQAImpl.h
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
decoderXQAImplCommon.cpp
Support speculative decoding with Hopper XQA ( #3269 )
2025-04-07 17:14:34 +08:00
decoderXQAImplCommon.h
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
decoderXQAImplPrecompiled.cpp
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
decoderXQAImplPrecompiled.h
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
decoderXQARunner.cpp
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
decoderXQARunner.h
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
mmha_notes.md
Initial commit
2023-09-20 00:29:41 -07:00
tensorMapUtils.cpp
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
tensorMapUtils.h
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00
xqaParams.h
[feat] Support XQA-based MLA on SM120 ( #4858 )
2025-06-06 22:32:49 +08:00