TensorRT-LLMs/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention
Perkz Zheng 03430ed379 [https://nvbugspro.nvidia.com/bug/5415268] fix illegal smem access with chunked attention (#6401)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
..
cubin
decoderXQAImplJIT [None][feat] Multi-block mode for Hopper spec dec XQA kernel (#4416) 2025-08-03 14:31:33 -07:00
instantiation
CMakeLists.txt
copy_cu.py
decoderMaskedMultiheadAttentionLaunch.h
decoderMaskedMultiheadAttentionTemplate.h [https://nvbugspro.nvidia.com/bug/5415268] fix illegal smem access with chunked attention (#6401) 2025-08-04 11:19:58 +08:00
decoderXQAConstants.h
decoderXQAImpl.cpp
decoderXQAImpl.h
decoderXQAImplCommon.cpp [fix] Fix missing fields in xqa kernel cache key (#6282) 2025-08-01 10:41:26 +08:00
decoderXQAImplCommon.h [None][feat] Multi-block mode for Hopper spec dec XQA kernel (#4416) 2025-08-03 14:31:33 -07:00
decoderXQAImplPrecompiled.cpp [None][feat] Multi-block mode for Hopper spec dec XQA kernel (#4416) 2025-08-03 14:31:33 -07:00
decoderXQAImplPrecompiled.h
decoderXQARunner.cpp
decoderXQARunner.h
mmha_notes.md
tensorMapUtils.cpp
tensorMapUtils.h
xqaParams.h