mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-29 07:40:21 +00:00
75f3bc94e6
* use integer dot product for quantized KV flash attention * small improvements * fix SHMEM_STAGING indexing * add missing KV type quants * fixes * add supported quants to FA tests * readd fast paths for <8bit quants * fix mmq gate and shmem checks