mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-29 15:50:22 +00:00
3a6db741a8
* opencl: refactor initialization * opencl: refactor GPU identification * opencl: rename for consistency * opencl: cache global mem size in dev_ctx * opencl: adjust log level * opencl: load argsort and flash_attn kernels in supports_op * argsort kernel must be built for supports_op for querying the max workgroups * flash_attn kernel has many variants, only load them when needed