llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 06:10:19 +00:00

Files

T

Masashi Yoshimura 7c908502ea ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811 )

* ggml-webgpu: improve small batches decoding

* Add barrier to the NUM_COLS loop in mul-mat-vec

2026-06-23 17:13:55 +09:00

2026-05-25 10:15:46 +03:00

2026-06-10 23:21:16 +05:30

2026-06-23 17:13:55 +09:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2026-06-19 10:19:14 +03:00