llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-25 22:00:21 +00:00

Files

T

History

Oliver Simons 1ec44d178d CUDA: Various fixes to cpy.cu (#25000 )

* Add failing test-case to test-backend-ops

Extracted from https://github.com/ggml-org/llama.cpp/issues/24072

* Minimize repro with help of AI

N = 8 * (65535 - 1) + 1 = 524273

* Port and adjust workaround from https://github.com/LostRuins/koboldcpp/commit/0ba798341e0c70517cb226cb63c966b086a3b5b3

Fall-back should share code, also relax y-z constraint to be inclusive

* Add test-case + fallback also for y dim

* Fix x-guards which is 2^{31}-1, so inlusive of INT_MAX

* Fix overflow problems for transposed copy kernel

2026-06-25 17:29:23 +02:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

sycl : support --split-mode tensor (#24152 )

2026-06-25 08:35:21 +03:00

src

CUDA: Various fixes to cpy.cu (#25000 )

2026-06-25 17:29:23 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs (#24954 )

2026-06-24 12:14:25 -07:00