llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 15:20:20 +00:00

Author	SHA1	Message	Date
Reese Levine	3ac3c20c96	ggml-webgpu: Add clang-format job (#24308 ) * Add clang-format job * try local formatting	2026-06-08 20:54:24 -07:00
Sigbjørn Skjæret	3f7c79d7b5	docker : bump cuda13 to 13.3.0 (#24228 )	2026-06-07 08:31:58 +02:00
Daniel Bevenius	46fa662b1f	ci : build-msys job slimming [no ci] (#24157 ) This PR attempts to slim down the dependencies for build-msys jobs making the same changes that we applied in whisper.cpp to reduce the size of the github actions cache, and should also improve the run time due to fewer dependencies that need to be installed. I realize this is a scheduled job but I think it would still make sense to apply these changes. Refs: https://github.com/ggml-org/whisper.cpp/pull/3858	2026-06-05 07:57:36 +02:00
Georgi Gerganov	4da6370d43	ci : disable ccache for msvc windows release jobs (#23911 )	2026-06-03 08:05:21 +03:00
Georgi Gerganov	a468b89018	ci : reduce self-hosted server workflow jobs (#24012 ) Reduce the number of parallel jobs in server-self-hosted.yml by stacking test configurations as sequential steps within a single job, following the pattern from #23927. - server-metal: 4 matrix jobs -> 1 job with 4 sequential test steps - server-cuda: 2 matrix jobs -> 1 job with 2 sequential test steps - server-kleidiai: removed unnecessary single-entry matrix - removed unused Setup Node.js step from server-metal Total: 7 parallel jobs -> 3 parallel jobs Assisted-by: llama.cpp:local pi	2026-06-02 13:17:59 +03:00
Georgi Gerganov	5dcb711666	speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988 ) * speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in common/speculative. Assisted-by: llama.cpp:local pi * cont : draft context always has n_parallel outputs * llama : log n_outputs_max * speculative : remove draft-simple auto-enable * ci : enable server tests on PRs	2026-06-01 22:26:58 +03:00
Georgi Gerganov	e22b0de60d	ci : add missing Linux label to cpu-x64-high-perf runner (#23958 ) Fixes: https://github.com/ggml-org/llama.cpp/pull/23927#discussion_r3332213086 The cpu-x64-high-perf job was missing the Linux label in its runs-on specification, causing the runner to not be discovered. All other self-hosted Linux jobs include this label. Assisted-by: llama.cpp:local pi	2026-06-01 10:39:59 +03:00
Eve	af6528e6df	ci: remove redundant or duplicate jobs (#23927 ) * remove redundant apple job openvino gpu and cpu test can share the same build and machine Update build-rpc.yml Update build-openvino.yml cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm remove duplicate x86 vulkan combine backend sampling Update server.yml run server on arm as windows is x86 * emdawn on one machine only * fix openvino, remove cpu tag as we dont have many x64 machines with that tag	2026-06-01 06:32:17 +03:00
Georgi Gerganov	399739d5c5	ci : limit trigger paths for the CPU workflow (#23938 )	2026-05-31 19:02:47 +03:00
Georgi Gerganov	4c4e91b799	ci : update ios-xcode release job to macos-26 (#23906 ) * ci : disable libcommon build from xcframework * ocd : fix name * ci : ios-xcode change to macos-26 * cont : pin xcode * cont : pin xcode to minor version	2026-05-30 13:21:46 +03:00
Georgi Gerganov	337528571d	ci : fix s390x release job (#23898 ) * ci : fix s390x release job * ci : multi-thread build for `ios-xcode` * ocd : names	2026-05-30 09:21:38 +03:00
Georgi Gerganov	d4204b03a5	ci : clear cache instead of "no timestamp" keys + fix macos (#23895 ) * ci : ios use macos-15 again * ci : add and test ccache-clear * cont : fix * cont : set permission * cont : another permission * cont : token * cont : print key * cont : bring back perms * cont : test windows * cont : add token * cont : cleanup * ci : make release jobs clean-up their ccache	2026-05-30 08:52:30 +03:00
Georgi Gerganov	dc71236b6c	ci : update macos release to use macos-26 runner (#23878 )	2026-05-29 20:41:57 +03:00
Sigbjørn Skjæret	3ef2369551	ci : run ui publish on ubuntu-slim (#23818 ) * run ui publish on self-hosted fast * run on ubuntu-slim	2026-05-28 20:58:32 +03:00
Georgi Gerganov	445b7cef62	ci : releases use Github-hosted builds for the UI (#23823 ) * ci : releases use Github-hosted builds for the UI * cont : fix name	2026-05-28 17:50:32 +03:00
Georgi Gerganov	dd1557907a	ci : change Vulkan builds to Release to reduce ccache (#23820 ) * ci : disable all CPU variant builds for Vulkan workflow * cont : change cache key * cont : change build type	2026-05-28 17:29:11 +03:00
Georgi Gerganov	491c4d7d2e	ci : refactor (#23789 ) * ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-05-28 09:44:25 +03:00
Georgi Gerganov	ba4dd0bc67	ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780 ) * ci : move ARM jobs to 3rd-party runners + disable kleidiai release * cont : fix deps + fix names * ocd : fix names * cont : fix PR links	2026-05-27 17:22:20 +03:00
Sigbjørn Skjæret	2d0656fbdd	ci : bump cuda release to 13.3 (#23749 )	2026-05-27 15:06:08 +03:00
Georgi Gerganov	6b4e4bd582	common : fix env names to all have LLAMA_ARG_ prefix (#23778 )	2026-05-27 14:52:47 +03:00
Georgi Gerganov	9f0e4b14d2	ci : fix windows ccaches (#23777 ) * ci : server windows set build type explicitly * cont : try windows-2025 * ci : use llvm * cont : use ninja * cont : fix shell * ci : set number of jobs correctly * ci : fix windows with vulkan ccache by using llvm * ci : server ccache only on master * ocd : fix job names [no release]	2026-05-27 13:54:21 +03:00
Sigbjørn Skjæret	b3a739c9b6	ci : remove wasm test (#23733 ) * run tests in correct build folder * remove wasm test	2026-05-27 13:11:37 +03:00
Georgi Gerganov	0d227ec358	ci : add ccache to server builds + fix undefined sanitizer build (#23763 ) * ci : fix undefined sanitizer build to use Debug build type only * ci : ccache the server builds * cont : remove ui dependency + reuse ccache for both ubuntu jobs * tmp : force ccache save * Revert "tmp : force ccache save" This reverts commit `a857b03a10`. * cont : no need for node.js	2026-05-27 11:45:12 +03:00
Georgi Gerganov	0d18aaa9d1	ci : do not allocate ccache for 3rd-party hosted runners (#23730 ) * ci : do not allocate ccache for 3rd-party hosted runners [no release] * cont : add prints [no ci] [no release]	2026-05-26 20:15:01 +03:00
Georgi Gerganov	08bc21b459	ci : move [no release] check to dedicated check_release job (#23734 ) * ci : move [no release] check to dedicated check_release job Move the workflow-level \`if\` condition that skips builds when the commit message contains \`[no release]\` into a lightweight \`check_release\` job. All build jobs now depend on it via \`needs\` and check its output. This ensures the skip logic is evaluated at the job level rather than at the workflow level, which is the recommended approach for conditional jobs. Assisted-by: llama.cpp:local pi * cont : use `fast` runner	2026-05-26 19:49:41 +03:00
Georgi Gerganov	35a74c8fb9	ci : add `[no release]` keyword + fix sanitizer builds (#23728 ) * ci : skip release workflow on master when commit message contains [no release] Assisted-by: llama.cpp:local pi * ci : restrict sanitizer builds to x86_64 + fix build type the spark is apparently too slow for some reason * tests : fix undefined warning [no ci]	2026-05-26 19:05:48 +03:00
Georgi Gerganov	5190c2ea8d	ci : move macos jobs to the apple workflow + fix names (#23721 )	2026-05-26 16:57:55 +03:00
Georgi Gerganov	3a3ed153d9	ci : remove vulkan SDK dep from webgpu job (#23718 ) * ci : remove vulkan dep from webgpu build * cont : add ccache to `ubuntu-24-webgpu-wasm` * ci : fix name + add wasm test	2026-05-26 16:40:30 +03:00
Georgi Gerganov	678d43d720	ci : move more CPU jobs to self-hosted runners (#23715 )	2026-05-26 15:37:40 +03:00
Georgi Gerganov	ef41a69179	ci : move sanitizer jobs to self-hosted runners (#23713 )	2026-05-26 15:22:09 +03:00
Georgi Gerganov	3dc7684f39	ci : reduce (disable SYCL and CANN builds/releases) (#23705 ) * ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]	2026-05-26 15:21:21 +03:00
Max Krasnyansky	4bead4e30d	snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23680 )	2026-05-25 10:57:43 -07:00
Georgi Gerganov	302e2c2652	ci : reduce PR jobs by matching backend paths (#23675 ) * ci : disable SYCL f16 builds * ci : extract android and hip into separate workflows * ci : move webgpu to separate workflow * ci : move the rpc to a separate workflow * ci : extract s309x and ppcl jobs * ci : extract opencl job into a separate workflow	2026-05-25 20:54:54 +03:00
alex-spacemit	5fdf07e33b	ci : update spacemit toolchain url and enhance curl command (#23642 ) * fix(action): update SpacemiT toolchain URL and version Change-Id: If4cc1c738a855274103f8c3ad52daa33528acd0c * fix(action): add -L flag to curl command for URL redirection Change-Id: I9b6c37390f0c7a733a36308c8fb53d22d234ab06	2026-05-25 10:43:24 +02:00
Sigbjørn Skjæret	062d3115aa	ci : fix pre-tokenizer-hashes check (#23651 )	2026-05-25 10:41:25 +02:00
Aldehir Rojas	d55fb97174	ci : install host compiler on android-ndk build (#23630 )	2026-05-25 10:18:08 +03:00
Georgi Gerganov	28123a3937	ci : move most slim jobs to self-hosted runners (#23619 ) * ci : remove tag from build-self-hosted.yml * ci : slim -> self-hosted * ci : prevent heavy CPU jobs from running on fast runners * ci : prevent cmake pkg to run on dedicated fast runners * ci : try to bump 3.11 -> 3.13 * ci : move lint back to 3.11 * ci : back to 3.11 * ci : add comment about UI jobs * ci : move python requirements check to CPU runners this job is a bit slow for a dedicated "fast" runner * ci : add self-hosted ui workflow * ci : fix UI naming * tmp to check if arm64 fast is compatible with all jobs * revert last commit	2026-05-25 08:11:19 +03:00
Georgi Gerganov	549b9d8433	ci : update build-self-hosted.yml (#23616 )	2026-05-24 18:20:10 +03:00
Aldehir Rojas	b22ff4b7b4	cmake/ui : refactor the build (#23352 )	2026-05-23 17:08:22 -04:00
Georgi Gerganov	bbce619adb	cmake : add install() for impl libraries + fix apple builds (#23511 ) * pi : update * ci : fix ios build * ci : fix andoroid * ci : fix apple builds * cmake : add install() for impl libraries Add install(TARGETS <target> LIBRARY) for all -impl libraries that were changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in commit `bb28c1fe2`. Without this, cmake --install fails to copy the shared libraries, causing runtime errors like: llama-server: error while loading shared libraries: libllama-server-impl.so Ref: https://github.com/ggml-org/llama.cpp/issues/23494#issuecomment-4512912515 Assisted-by: llama.cpp:local pi * ci : fix xcframework build	2026-05-22 11:46:26 +03:00
Georgi Gerganov	bb28c1fe24	cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462 ) * cmake : remove STATIC from impl libraries, allow BUILD_SHARED_LIBS control Remove explicit STATIC from all -impl libraries (server, cli, completion, bench, batched-bench, fit-params, quantize, perplexity) so BUILD_SHARED_LIBS controls shared vs static linkage. Add WINDOWS_EXPORT_ALL_SYMBOLS ON for proper DLL export on Windows. Assisted-by: llama.cpp:local pi * cmake : enable LLAMA_BUILD_APP by default Assisted-by: llama.cpp:local pi * ci : disable app in build-cmake-pkg.yml	2026-05-21 21:13:59 +03:00
Max Krasnyansky	871b0b70f8	snapdragon: update toolchain to v0.6 (#23369 ) * snapdragon: update compiler flags to enable all CPU features * snapdragon: update readme to point to toolchain v0.6 * snapdragon: bump toolchain docker to v0.6	2026-05-19 22:04:04 -07:00
Johannes Gäßler	a8078675a6	github: mention --log-file in issue templates (#23277 )	2026-05-19 21:35:10 +02:00
Aleksander Grygier	6db130445d	ui: Bump packages + address build warnings (#23300 ) * chore: Update vulnerable packages * chore: Formatting * refactor: Update Tailwind CSS imports * ci: Use `ubuntu-latest` for Unit/E2E UI tests * chore: Bump package * fix: Add missing tag * refactor: Enums files naming	2026-05-19 10:16:04 +02:00
Sigbjørn Skjæret	4b262ab662	ci : install libssl-dev (#23325 )	2026-05-19 11:11:04 +03:00
Sigbjørn Skjæret	00c461ce1a	ci : install server kleidiai runner dependencies (#23259 )	2026-05-19 09:06:56 +02:00
SamareshSingh	5cbaa5e69e	docker : add OCI image labels for version and build date (#21653 ) * docker: add OCI image labels to all published images * docker: propagate OCI labels as manifest and index annotations * docker: drop hardcoded org URL and revert accidental intel version bump The OCI image url and source are now driven by build args with a sensible default. The workflow passes the actual repository url so fork builds get labels pointing at the fork instead of upstream. Also restores the IGC, compute runtime, and IGDGMM versions in the intel Dockerfile labeled stage which I accidentally bumped in the first commit. * docker: add skip_s390x workflow_dispatch input for fast test runs Lets maintainers and PR authors trigger the docker workflow without the s390x build target, which depends on the IBM Z runner and is by far the slowest job in the matrix. The flag filters the s390x row out of the build matrix before merge_matrix is derived, so the merge job sees a consistent shape too. Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com> --------- Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>	2026-05-18 22:14:45 +02:00
Aleksander Grygier	3a9c1b854d	ui: Update KaTeX package and clean up logs from `sass` warnings (#23275 ) * ui: migrate katex imports to @use to resolve SCSS deprecation warnings * ci: Use `ubuntu-slim` for CI (UI) workflow	2026-05-18 16:26:01 +02:00
Martin Klacer	053e01dff6	ci : added kleidiai-server to server-self-hosted workflow (#22435 ) * kleidiai: added kleidiai-server to server-self-hosted workflow * Added KleidiAI-enabled Arm64 Linux llama-server CI/integration test workflow into the server-self-hosted.yml configuration file Signed-off-by: Martin Klacer <martin.klacer@arm.com> Change-Id: I032e33c525b7e26bc5d53719f638bee610cec1ee * Added self-hosted executor for KleidiAI server workflow Signed-off-by: Martin Klacer <martin.klacer@arm.com> * Update .github/workflows/server-self-hosted.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Martin Klacer <martin.klacer@arm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-18 11:14:57 +02:00
Aleksander Grygier	1d9f99aa75	fix: Add build step using build workflow to publish workflow (#23134 )	2026-05-16 11:22:59 +02:00

1 2 3 4 5 ...

598 Commits