mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-28 15:20:20 +00:00
95b8b8ec1a
Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.