mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-29 15:50:22 +00:00
50b7f076a5
Change ggml_vk_mul_mat_vec_id_q_f16 to loop over the batch dimension and update the indexing calculations in get_offsets. Mat-vec is faster than mat-mat for small values of n. We don't get the same reuse of the weights as in the non-ID path, but with this the cost is linear in n rather than n>1 being far slower than n==1.