llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 15:20:20 +00:00

Files

T

Reese Levine e8c54893f2 ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834 )

* Start work on flash_attn refactor

* Refactor

* Split k/v quantization

* Refactor and abstract quantization logic for flash_attn and mul_mat

* Add quantization support to tile path

* formatting

* Move to functions, add a check

2026-06-04 08:05:04 +03:00

ggml-blas

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

ggml-cann

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

ggml-cpu

ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754 )

2026-06-04 08:03:40 +03:00

ggml-cuda

Avoid PDL race conditions by disabling __restrict__ when PDL is used (#24030 )

2026-06-03 13:56:42 +02:00

ggml-hexagon

hexagon: profiler output fix and script updates (#24042 )

2026-06-02 14:08:29 -07:00

ggml-hip

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-metal

metal: template GLU kernels to support f16/f32 (#23882 )

2026-06-01 15:40:28 +03:00

ggml-musa

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

2026-03-22 11:05:51 +01:00

ggml-opencl

opencl: use flat variants of q4_K and q6_K gemv for very large M (#24006 )

2026-06-02 14:16:17 -07:00

ggml-openvino

openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944 )

2026-04-21 18:58:34 +03:00

ggml-rpc

rpc : keep last_graph_uid in the device context (#23273 )

2026-05-19 09:42:36 +03:00

ggml-sycl

[SYCL] Support Q4_1, Q5_0, Q5_1 in Flash-attention (#23812 )

2026-06-01 09:53:53 +03:00

ggml-virtgpu

ggml-virtgpu : include missing mutex header (#22810 )

2026-05-10 17:32:41 +02:00

ggml-vulkan

vulkan: don't hold the device mutex while compiling pipelines (#23641 )

2026-06-01 14:04:01 +02:00

ggml-webgpu

ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834 )

2026-06-04 08:05:04 +03:00

ggml-zdnn

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

ggml-zendnn

ggml-zendnn : fixed naming of matmul function (#20964 )

2026-05-27 00:59:35 +02:00

CMakeLists.txt

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

ggml-alloc.c

ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (ggml/1492)

2026-05-25 12:38:01 +03:00

ggml-backend-dl.cpp

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-dl.h

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-impl.h

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-backend-meta.cpp

TP: quantized KV cache support (#23792 )

2026-06-01 12:30:10 +02:00

ggml-backend-reg.cpp

ggml : skip already registered backends and devices (#22296 )

2026-04-28 10:02:32 +03:00

ggml-backend.cpp

ggml : Check the right iface method before using the fallback 2d get (#23514 )

2026-05-23 12:49:24 +02:00

ggml-common.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-impl.h

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-opt.cpp

fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592 )

2026-04-08 17:40:15 +02:00

ggml-quants.c

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

ggml-quants.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

ggml: gguf_init_from_callback and gguf_init_from_buffer (#22341 )

2026-05-25 11:33:29 +02:00