llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 14:20:21 +00:00

Author	SHA1	Message	Date
Alessandro de Oliveira Faria (A.K.A.CABELO)	0d2d9ccbf6	vendor : update cpp-httplib to 0.48.0 (#24787 )	2026-06-19 22:16:35 +08:00
Georgi Gerganov	5fd2dc2c41	sync : ggml	2026-06-19 10:19:14 +03:00
o7si	32eddaf2ea	cmake : fix ui build with read-only source (#24752 )	2026-06-18 18:59:18 +02:00
Max Krasnyansky	d2c67959b3	hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events) (#24592 ) * hex-optrace: add support for optrace and instrument matmul and flash-atten code * hex-trace: improve trace event and prefetto generator * hex-trace: add new script dedicated to handling traces, specifically perfetto traces * hex-trace: add --head/--tail options to profile and trace tools * hex-trace: fix whitespaces * hex-trace: fix flake8 warnings * hex-trace: fix flake8 warnings * hmx-fa: restore q_tiles clearing * hex-profile: remove circular dep in includes * hex-trace: simplify trace sizing check * hex-profile: sort events in the summary by name	2026-06-18 08:35:02 -07:00
Xuan-Son Nguyen	e8067a8b36	ui: build-time gzip compression (#24571 ) * ui: keep original file name and path * fix nocache * ui: build-time gzip compression	2026-06-13 16:57:27 +02:00
Xuan-Son Nguyen	597b6672e8	ui: keep original file name and path (#24568 ) * ui: keep original file name and path * fix nocache	2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen	57fe1f07c3	server: clean up static assets handling (#24550 ) * server: clean up static assets handling * nits * simplify file name handling, use static file name everywhere * cmake/ui : bundle UI assets in an archive * ui : run prettier on post-build.js --------- Co-authored-by: Alde Rojas <hello@alde.dev>	2026-06-13 11:51:20 +02:00
Aleksander Grygier	f7ca93d12c	ui: PWA support (#23871 ) * feat: Add basic PWA support and service worker for offline caching * feat: Vite PWA implementation WIP * feat: Improve PWA icons generation * feat: Add PWA workbox to server routes * feat: Include `version.json` in static assets * feat: Add HTTP cache headers for PWA static assets * feat: Update app name for `apple-mobile-web-app-title` * feat: Implement PWA versioning and automatic update detection * chore: Update `.gitignore` files * feat: Splash Screens * feat: Add dark mode favicon support * refactor: Cleanup * fix: Use dark logo for dark splash screens * refactor: Simplify favicons SVG code * fix: Adjust caching and polling for reliable service worker updates * fix: Add missing favicon entry * fix: Align PWA service worker configuration with SvelteKit build structure * fix: Replace hashed bundle paths with versioned static paths * test: Add PWA tests * ci: Add build output for unit tests * refactor: Cleanup * fix: Server build & release versioning * chore: Update package-lock.json * chore: Increase PWA cache size * chore: Update packages * feat: Update favicons * refactor: Post-merge fix * feat: support explicit build version for PWA cache busting * fix: CI * feat: Improve PWA Refresh Alert UI * feat: Add toggleable build version display * refactor: Cleanup * feat: Add version mismatch detection and manual app reload * refactor: replace dynamic imports with static * refactor: Cleanup * feat: Add safe space for `pwa-<size>.png` rendered icons * fix: use relative paths for PWA assets to support base path deployment * feat: add PWA mode detection via URL query parameter * feat: Use ?cache=true for SW-cached PWA assets * refactor: Build process cleanup * refactor: Decouple PWA versioning and remove ?cache=true workaround * chore: Update README logo * feat: Include PWA Assets generation in build script * refactor: `usePwa` hook for core layout * fix: Relativize base vite plugin * fix: remove unnecessary backslash escapes in test regexes * test: update static asset paths for API Key test * refactor: Move SvelteKit PWA Options config to constants * ui: fix update notification never appearing Keep the PWA hook object intact instead of destructuring needRefreshByStorage, which freezes the reactive getter. Also exclude loading.html from PWA precache to prevent 404 errors and broken SW installation.	2026-06-12 15:53:26 +02:00
Georgi Gerganov	f532be8fac	sync : ggml	2026-06-12 15:55:35 +03:00
Adrien Gallouët	70b54e140c	vendor : update cpp-httplib to 0.47.0 (#24395 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-06-12 11:34:44 +02:00
Georgi Gerganov	263cc04a54	sync : ggml	2026-06-11 19:34:19 +03:00
Georgi Gerganov	c2b1518fd4	sync : ggml	2026-06-08 14:31:33 +03:00
Pascal	2016bf2b3b	ui: run npm install when package-lock.json is newer than node_modules (#24171 )	2026-06-05 14:57:32 +02:00
Max Krasnyansky	5c394fdc8b	hexagon: profiler output fix and script updates (#24042 ) * hex-ops: fix profiler output (ie remove the redundant NONEs) * hex-prof: update profiling script to support tot.usec column	2026-06-02 14:08:29 -07:00
Adrien Gallouët	335abed17d	vendor : update cpp-httplib to 0.46.1 (#23980 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-06-01 19:40:10 +03:00
Ruixiang Wang	689a9a470e	server-bench : add speed-bench for speculative decoding benchmarking (#23869 ) * spec: add speed-bench support for benchmarking * speed-bench : add trailing newline to requirements.txt * speed-bench : bump datasets to 4.8.0 to fix ty check * server-bench : remove now-unused type: ignore after datasets bump	2026-05-29 23:09:47 +02:00
Georgi Gerganov	fe12e422ad	sync : ggml	2026-05-29 09:56:08 +03:00
Max Krasnyansky	19e92c33ef	hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case.	2026-05-28 14:05:54 -07:00
Max Krasnyansky	aa50b2c2ae	hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647 ) * hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now * hmx-mm: add support for Q4_1 * hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot * hexagon: fix repack scratch buffer overflow * hex-mm: fix Q4_1 repack buffer sizing * hexagon: flip the build order for mm and fa (seems to help LTO) * hex-mm: add vec_dot 4x1s and minor HMX cleanup after adding Q4_1 * hex-mm: fix fp16 vec_dot fallback to 2x1 and another issue that could cause incorrect output * hexagon: resurrect early-wake and add support for polling for op-batch completions With Q4_1 ggml-hexagon now claims pretty much the entire graphs which gives the CPU more time to chilax. This is a good thing! But it does add extra latency for the pure benchmark runs. Early wakeup helps recover the latency a bit in the normals runs and op-batch polling is just for benchmarking. --------- Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>	2026-05-27 10:46:11 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	617255d437	vendor : update cpp-httplib to 0.46.0 (#23650 )	2026-05-27 21:36:24 +08:00
Georgi Gerganov	d161ea7071	sync : ggml	2026-05-25 12:43:27 +03:00
Georgi Gerganov	22307b3e8b	sync : ggml	2026-05-25 12:38:01 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	9627d0f540	vendor : update cpp-httplib to 0.45.1 (#23639 )	2026-05-25 09:45:22 +03:00
Aparna M P	cec51c7a7d	snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552 )	2026-05-23 19:56:41 -07:00
Aldehir Rojas	b22ff4b7b4	cmake/ui : refactor the build (#23352 )	2026-05-23 17:08:22 -04:00
Max Krasnyansky	c9872a2575	hexagon: HMX quantized matmul rework (#23368 ) * hmx-mm: update debug logging in hmx-mm * hmx-mm: update dequant logic to use HVX_vector_x2/4 * hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version * hmx-mm: use activation depth mode and update naming Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com> * hex-mm: minor hmx matmul naming updates * hmx-mm: remove unused vars * snapdragon: scripts bump default ubatch-size to 1K * hexagon: combine HMX and power and clock settings into a single set_power call * hmx-mm: remove leftover of the scale repl helper * hexagon: fix editconf error --------- Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>	2026-05-20 07:39:01 -07:00
Georgi Gerganov	c3f95c1f06	scripts : allow wc2wt with an existing branch (#23189 )	2026-05-18 08:57:28 +03:00
Georgi Gerganov	3a92bc99db	sync : ggml	2026-05-16 16:11:29 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	18675b6bbc	vendor : update cpp-httplib to 0.45.0 (#23103 )	2026-05-16 15:25:21 +03:00
Aleksander Grygier	59778f0196	ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llama-ui` / `LLAMA_UI` naming (#23064 ) * webui: Move static build output from `tools/server/public` to `build/ui` directory * refactor: Move to `tools/ui` * refactor: rename CMake variables and preprocessor defines - Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED * refactor: rename CLI flags (--webui -> --ui) with backward compat - Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals * refactor: update C++ server internals with backward compat - Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy \|\| params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) \|\| defined(LLAMA_BUILD_WEBUI) * refactor: rename CI/CD workflows, artifacts, and build script - Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label * docs: update CODEOWNERS and server README docs - Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/ * fix: Small fixes for UI build * fix: CMake.txt syntax * chore: Formatting * fix: `.editorconfig` for llama-ui * chore: Formatting * refactor: Use `APP_NAME` in Error route * refactor: Cleanup * refactor: Single migration service * make llama-ui a linkable target * fix: UI Build output * fix: Missing change * fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI * refactor: UI workflows cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-16 02:02:40 +02:00
Omer Ozarslan	1348f67c58	webui: Use lowercase hash for HF checksum check (#23107 )	2026-05-15 19:38:16 +02:00
Zack Li	d81e63dcfd	CI : support IOT device (IQ9) (#22987 ) * update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag	2026-05-14 13:58:34 -07:00
Aleksander Grygier	0c3e4fccca	fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051 ) * fix: Propagate version tag to WebUI asset download in self-hosted CI * refactor: Apply suggestions from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: Skip npm build when Node.js is not installed Avoid 'no such file or directory' errors on CI runners that lack Node.js. Check if npm is available via find_program before attempting npm install + npm run build. Falls back to HF Bucket download. * fix: Use + separator for ASSETS list to fix Windows build Replace fragile \; escaping with a + separator when passing the WebUI asset list via -DASSETS to the download script. On Windows, the \; escaping was not reliably preserved through the CMake build system, causing all asset filenames to be concatenated into one (e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single file), which broke the HF Bucket download and subsequent xxd.cmake step. + is safe because it is not special in cmd.exe (unlike \| which is a pipe operator), not special in CMake's -D argument parser, and not a valid Windows filename character. CMakeLists.txt joins assets with + and webui-download.cmake splits them back via regex. * fix: Validate HF_WEBUI_VERSION environment variable with regex Add input validation for the HF_WEBUI_VERSION env var to prevent CMake list separator or path-traversal issues in stamp filenames and download URLs. Rejects non-conforming characters early. * fix: Remove 'latest' fallback for HF_WEBUI_VERSION When needs.determine-tag.outputs.tag_name is empty, let CMake's default resolution handle it (empty -> git-based version lookup) instead of falling back to 'latest'. This ensures the sentinel stamp file is consistent with CMake's resolution logic. * fix: Demote checksum verification failure to warning instead of hard gate * fix: End line character --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 17:57:20 +02:00
Aleksander Grygier	253ba110bc	webui: Move static build output from repo code to HF Bucket (#22937 ) * ci: add workflow to publish webui to Hugging Face bucket * ci: add webui release job to release workflow * ci: test webui release job * chore: Return to default minification strategy for build output files * ci: extract webui build into separate workflow and job * chore: Ignore webui static output + clean up references * chore: Delete legacy webui static output * chore: Ignore webui build static output * fix: Workflow * fix: Versioning naming * chore: Update package name * test: Test CI fix * refactor: Naming * server: implement webui build strategy with HF Bucket support * chore: Remove test workflow * chore: Use WebUI build workflow call in other workflows * server: HF Buckets fallback for WebUI build * refactor: App name variable * refactor: Naming * fix: Retrieve loading.html * fix: workflow syntax * fix: Rewrite malformed release.yml * fix: Req param * test: Re-add missing Playwright installation for CI tests * refactor: Logic & security improvements * refactor: Retrieve publishing jobs and DRY the workflows * fix: Test workflow syntax * fix: Upstream Release Tag for test workflow * chore: Remove test workflow * ci: Run WebUI jobs on `ubuntu-24.04-arm` * refactor: Post-CR cleanup Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * refactor: CI cleanup * refactor: Cleanup * test: Test workflow * refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads * server: add fallback mechanism for HF Bucket webui downloads from latest directory * fix: Incorrect argument order in file(SHA256) calls for checksum verification * refactor: Use cmake script for handling the HF Bucket download on build time * feat: support local npm build for WebUI assets * refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning * refactor: Cleanup * chore: Remove test workflow * fix: remove s390x from release workflow * fix: add webui-build dependency to ubuntu-22-rocm and windows-hip * Revert "fix: remove s390x from release workflow" This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19. * fix: Release workflow file * fix: Proper release tag used for HF Bucket upload * fix: Remove duplicate steps in release workflow --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 13:21:41 +02:00
Trivikram Reddy	856c3adac1	hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993 ) * hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase * hmx-mm: optimize per-group scale handling * hmx-fa: optimize slope load from vtcm * hmx-fa: use aligned access where possible in hmx-utils * hexagon: add hvx_vec_repl_2x_f16 helper and consolidate repl helpers --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-05-12 17:28:02 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	838374375c	vendor : update cpp-httplib to 0.44.0 (#22919 )	2026-05-11 08:47:13 +02:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	5d5d2e15d2	vendor : update cpp-httplib to 0.43.4 (#22888 )	2026-05-10 18:46:54 +02:00
Georgi Gerganov	0b047287fe	sync : ggml	2026-05-10 17:00:11 +03:00
Georgi Gerganov	70a8309114	sync : ggml	2026-05-05 13:15:59 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	a09a00e502	vendor : update cpp-httplib to 0.43.3 (#22686 )	2026-05-05 09:04:57 +02:00
Piotr Wilkin (ilintar)	a4701c98f7	common/autoparser: fixes for newline handling / forced tool calls (#22654 ) * chat/autoparser: the fixes * Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls. * Trim whitespace on apply instead	2026-05-04 13:18:11 +02:00
Georgi Gerganov	228e836344	sync : ggml	2026-05-02 08:55:29 +03:00
Georgi Gerganov	457e2288c9	sync : ggml	2026-05-02 07:22:35 +03:00
Sigbjørn Skjæret	6118c043b1	ci : bump ty to 0.0.33 (#22535 ) * bump ty to 0.0.33 * update typings	2026-04-30 16:15:54 +03:00
Adrien Gallouët	5f0ab726f7	vendor : update cpp-httplib to 0.43.2 (#22548 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-04-30 15:04:39 +02:00
Georgi Gerganov	27aef3dd91	scripts : add wc2wt.sh - create worktree from current HEAD (#22513 ) * scripts : add wc2wt.sh - create worktree from current HEAD Add a script to create a git worktree on a new branch from the current HEAD. Similar to pr2wt.sh but for local development branches instead of PRs. Usage: ./scripts/wc2wt.sh gg/new-feature ./scripts/wc2wt.sh gg/new-feature "bash -l" Assisted-by: llama.cpp:local pi * cont : no need to try to delete the branch	2026-04-30 09:20:26 +03:00
Max Krasnyansky	41a63be28e	hexagon: make vmem and buffer-size configurable (#22487 ) * hexagon: allow host to set max vmem size We use a sane default but it's helpful to allow for an override if needed. * hexagon: add support for measuring vmem space and move pinned mmaping management to host * hexagon: update vmem checks to use uint64 * hexagon: bump op buffers to 16 (matches max mmaps) * hexagon: bump default vmem to 3.2GB * hexagon: add support for autodetecting vmem space and some logging cleanup in that area * hexagon: fix whitespace warnings * Update scripts/snapdragon/adb/run-cli.sh Co-authored-by: Pascal <admin@serveurperso.com> * hex-adb: fix run-completion script --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-04-29 11:51:21 -07:00
Georgi Gerganov	b1d5f5b449	sync : ggml	2026-04-29 16:43:47 +03:00
Georgi Gerganov	f535774325	pr2wt : symlink .pi (#22386 )	2026-04-26 19:49:26 +03:00
Piotr Wilkin (ilintar)	0adede866d	parser: fix structured output bug (#22302 ) * fix very stupid structured output bug * Things just cannot be too easy.	2026-04-24 23:19:55 +02:00

1 2 3 4 5 ...

409 Commits