llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-29 15:50:22 +00:00

Files

T

Aadeshveer Singh 24af22fc36 ggml : optimize cuda ssm_scan using warp-level reduction (#18505 )

* ggml : optimize cuda ssm_scan using warp-level reduction

* ggml : apply code review suggestions (style, const, constexpr)

* ggml : add TODO regarding stride consistency

2026-01-07 02:24:34 +08:00

ggml-blas

sync : whisper.cpp (ggml/1359)

2025-09-29 17:43:58 +03:00

ggml-cann

CANN: Make valid_values variable static const (#18627 )

2026-01-06 11:53:28 +08:00

ggml-cpu

kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 )

2025-12-30 14:04:53 +02:00

ggml-cuda

ggml : optimize cuda ssm_scan using warp-level reduction (#18505 )

2026-01-07 02:24:34 +08:00

ggml-hexagon

ggml-hexagon: optimize activation function (#18393 )

2026-01-02 21:24:24 -08:00

ggml-hip

HIP: fix AMDGPU_TARGETS, update documentation (#16803 )

2025-10-27 21:39:49 +01:00

ggml-metal

metal : adjust extra size for FA buffer to avoid reallocations (#18545 )

2026-01-02 19:02:18 +02:00

ggml-musa

CUDA: faster tile FA, add oob checks, more HSs (#16492 )

2025-10-11 20:54:32 +02:00

ggml-opencl

opencl: allow resizing transpose buffers (#18384 )

2025-12-27 15:51:14 -08:00

ggml-rpc

rpc : use unordered_map::reserve and emplace (#18513 )

2026-01-02 12:09:36 +02:00

ggml-sycl

sycl: add newline at the end of CMakeLists.txt (#18503 )

2025-12-31 14:23:44 +08:00

ggml-vulkan

vulkan: support buffer_from_host_ptr (#18467 )

2026-01-06 17:37:07 +01:00

ggml-webgpu

ggml webgpu: add CEIL operation support (#18605 )

2026-01-05 11:38:57 -08:00

ggml-zdnn

zdnn: refactor codebase + add docs (#16178 )

2025-09-23 14:53:05 +08:00

ggml-zendnn

ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )

2025-12-07 00:13:33 +08:00

CMakeLists.txt

kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 )

2025-12-30 14:04:53 +02:00

ggml-alloc.c

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )

2025-12-15 09:24:59 +01:00

ggml-backend-impl.h

rpc : add support for multiple devices (#16276 )

2025-10-04 12:49:16 +03:00

ggml-backend-reg.cpp

ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )

2025-12-07 00:13:33 +08:00

ggml-backend.cpp

vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 )

2026-01-01 08:58:27 +01:00

ggml-common.h

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

ggml-impl.h

cmake: Added more x86_64 CPU backends when building with GGML_CPU_ALL_VARIANTS=On (#18186 )

2025-12-28 09:33:29 +02:00

ggml-opt.cpp

finetune: SGD optimizer, more CLI args (#13873 )

2025-08-14 12:03:57 +02:00

ggml-quants.c

ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928 )

2025-09-23 10:25:20 +02:00

ggml-quants.h

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

ggml : fix avx512bf16 build (#18623 )

2026-01-06 08:54:10 +02:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

ggml, llama : use defaulted constructors/destructors (#17649 )

2025-12-03 07:12:18 +01:00