| .. |
|
cubin
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImplJIT
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
CMakeLists.txt
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
copy_cu.py
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention32_bf16_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention32_bf16.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention32_float_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention32_float.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention32_half_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention32_half.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention48_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention48_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention48_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention64_bf16_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention64_bf16.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention64_float_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention64_float.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention64_half_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention64_half.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention80_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention80_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention80_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention96_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention96_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention96_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention104_bf16 .cu
|
Update TensorRT-LLM Release branch (#1445)
|
2024-04-12 17:59:19 +08:00 |
|
decoderMaskedMultiheadAttention104_float.cu
|
Update TensorRT-LLM Release branch (#1445)
|
2024-04-12 17:59:19 +08:00 |
|
decoderMaskedMultiheadAttention104_half.cu
|
Update TensorRT-LLM Release branch (#1445)
|
2024-04-12 17:59:19 +08:00 |
|
decoderMaskedMultiheadAttention112_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention112_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention112_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention128_bf16_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention128_bf16.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention128_float_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention128_float.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention128_half_implicit_relative_attn.cu
|
Update TensorRT-LLM Release branch (#1192)
|
2024-02-29 17:20:55 +08:00 |
|
decoderMaskedMultiheadAttention128_half.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention144_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention144_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention144_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention160_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention160_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention160_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention192_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention192_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention192_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention224_bf16.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention224_float.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention224_half.cu
|
Update TensorRT-LLM (#506)
|
2023-11-30 16:46:22 +08:00 |
|
decoderMaskedMultiheadAttention256_bf16.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention256_float.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttention256_half.cu
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
decoderMaskedMultiheadAttentionLaunch.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderMaskedMultiheadAttentionTemplate.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAConstants.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImpl.cpp
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImpl.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImplCommon.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImplPrecompiled.cpp
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQAImplPrecompiled.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQARunner.cpp
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
decoderXQARunner.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
mmha_notes.md
|
Initial commit
|
2023-09-20 00:29:41 -07:00 |
|
tensorMapUtils.cpp
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
tensorMapUtils.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |
|
xqaParams.h
|
TensorRT-LLM v0.10 update
|
2024-06-05 20:43:25 +08:00 |