llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 07:10:21 +00:00

Files

T

Jeff Bolz c74759a244 vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991 )

This allows vec4 loads of the B elements. Also increase BK to 64 when this is
enabled. Neither of these alone is consistently faster, but together these give
a nice speedup.

In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are
multiples of 4.

2026-06-08 10:40:37 +02:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

TP: quantized KV cache support (#23792 )

2026-06-01 12:30:10 +02:00

src

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991 )

2026-06-08 10:40:37 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.13.1 (ggml/1523)

2026-05-29 09:56:08 +03:00