llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-01 16:50:20 +00:00

Files

T

Alexey Kopytko cc9e331213 SYCL: improve MoE prefill throughput (#23142 )

- change `k_copy_src1_to_contiguous` so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends
- switch the `O(n_as * n_routed_rows)` contraption to a counting sort-based procedure with `O(n_as + n_routed_rows)` complexity

2026-05-22 15:50:17 +03:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

src

SYCL: improve MoE prefill throughput (#23142 )

2026-05-22 15:50:17 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.12.0 (ggml/1494)

2026-05-16 16:11:29 +03:00