llama.cpp/ggml/src at b9093 - llama.cpp - Gitea: Git with a cup of tea

kanshan/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-03 01:30:23 +00:00

Files

T

History

Alexey Kopytko e20b83930c SYCL: reduce allocation overhead during flash attention (#22732 )

* SYCL: reduce allocation overhead during flash attention

* tidy up whitespace

* add a note about the flag

* move ggml_sycl_fattn_* into fattn-buffers.hpp

* refactor implementation into fattn-buffers.cpp

* move new_fattn_kv_buffers back into ggml-sycl.cpp

2026-05-09 09:30:39 +03:00

..

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

ggml-cpu: Optimized risc-v cpu q1_0 dot

2026-05-07 21:09:25 +08:00

Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812 )

2026-05-09 11:28:29 +08:00

hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837 )

2026-05-08 17:12:04 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

llama : fix device state save/load (#22805 )

2026-05-07 21:43:40 +03:00

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

2026-03-22 11:05:51 +01:00

opencl: add q4_0 MoE GEMM for Adreno (#22731 )

2026-05-07 21:17:07 -07:00

openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944 )

2026-04-21 18:58:34 +03:00

rpc : use graph uid instead of graph cache (#22701 )

2026-05-05 13:47:13 +03:00

SYCL: reduce allocation overhead during flash attention (#22732 )

2026-05-09 09:30:39 +03:00

ggml-virtgpu: fix circular dependency in headers (#22557 )

2026-05-02 21:28:50 +08:00

vulkan: fix spv shadowing (#22760 )

2026-05-08 09:35:22 +02:00

ggml-webgpu: add layer norm ops (#22406 )

2026-05-03 20:52:53 -07:00

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

CMakeLists.txt

ggml : revert to -lm linking instead of find_library (#22355 )

2026-04-28 09:56:02 +03:00

ggml-alloc.c

ggml : remove ggml-ext.h (#21869 )

2026-04-14 17:32:58 +03:00

ggml-backend-dl.cpp

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-dl.h

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-impl.h

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-backend-meta.cpp

vulkan: add get/set tensor 2d functions (#22514 )

2026-04-30 17:37:13 +02:00

ggml-backend-reg.cpp

ggml : skip already registered backends and devices (#22296 )

2026-04-28 10:02:32 +03:00

ggml-backend.cpp

ggml: update SCHED_DEBUG output to use ggml_op_desc() (#22825 )

2026-05-07 22:43:04 -07:00

ggml-common.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-impl.h

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-opt.cpp

fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592 )

2026-04-08 17:40:15 +02:00

ggml-quants.c

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-quants.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

ggml : implement fast walsh-hadamard transform for kv rotation (#21352 ) (#22631 )

2026-05-05 10:05:05 +08:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00