TensorRT-LLMs/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention
Kaiyu Xie 9bd15f1937
TensorRT-LLM v0.10 update
* TensorRT-LLM Release 0.10.0

---------

Co-authored-by: Loki <lokravi@amazon.com>
Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com>
2024-06-05 20:43:25 +08:00
..
cubin TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImplJIT TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
CMakeLists.txt TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
copy_cu.py Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention32_bf16_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention32_bf16.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention32_float_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention32_float.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention32_half_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention32_half.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention48_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention48_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention48_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention64_bf16_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention64_bf16.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention64_float_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention64_float.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention64_half_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention64_half.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention80_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention80_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention80_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention96_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention96_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention96_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention104_bf16 .cu Update TensorRT-LLM Release branch (#1445) 2024-04-12 17:59:19 +08:00
decoderMaskedMultiheadAttention104_float.cu Update TensorRT-LLM Release branch (#1445) 2024-04-12 17:59:19 +08:00
decoderMaskedMultiheadAttention104_half.cu Update TensorRT-LLM Release branch (#1445) 2024-04-12 17:59:19 +08:00
decoderMaskedMultiheadAttention112_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention112_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention112_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention128_bf16_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention128_bf16.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention128_float_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention128_float.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention128_half_implicit_relative_attn.cu Update TensorRT-LLM Release branch (#1192) 2024-02-29 17:20:55 +08:00
decoderMaskedMultiheadAttention128_half.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention144_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention144_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention144_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention160_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention160_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention160_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention192_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention192_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention192_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention224_bf16.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention224_float.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention224_half.cu Update TensorRT-LLM (#506) 2023-11-30 16:46:22 +08:00
decoderMaskedMultiheadAttention256_bf16.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention256_float.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttention256_half.cu Initial commit 2023-09-20 00:29:41 -07:00
decoderMaskedMultiheadAttentionLaunch.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderMaskedMultiheadAttentionTemplate.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAConstants.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImpl.cpp TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImpl.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImplCommon.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImplPrecompiled.cpp TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQAImplPrecompiled.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQARunner.cpp TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
decoderXQARunner.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
mmha_notes.md Initial commit 2023-09-20 00:29:41 -07:00
tensorMapUtils.cpp TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
tensorMapUtils.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00
xqaParams.h TensorRT-LLM v0.10 update 2024-06-05 20:43:25 +08:00