llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 14:20:21 +00:00

Author	SHA1	Message	Date
Aleksander Grygier	f7ca93d12c	ui: PWA support (#23871 ) * feat: Add basic PWA support and service worker for offline caching * feat: Vite PWA implementation WIP * feat: Improve PWA icons generation * feat: Add PWA workbox to server routes * feat: Include `version.json` in static assets * feat: Add HTTP cache headers for PWA static assets * feat: Update app name for `apple-mobile-web-app-title` * feat: Implement PWA versioning and automatic update detection * chore: Update `.gitignore` files * feat: Splash Screens * feat: Add dark mode favicon support * refactor: Cleanup * fix: Use dark logo for dark splash screens * refactor: Simplify favicons SVG code * fix: Adjust caching and polling for reliable service worker updates * fix: Add missing favicon entry * fix: Align PWA service worker configuration with SvelteKit build structure * fix: Replace hashed bundle paths with versioned static paths * test: Add PWA tests * ci: Add build output for unit tests * refactor: Cleanup * fix: Server build & release versioning * chore: Update package-lock.json * chore: Increase PWA cache size * chore: Update packages * feat: Update favicons * refactor: Post-merge fix * feat: support explicit build version for PWA cache busting * fix: CI * feat: Improve PWA Refresh Alert UI * feat: Add toggleable build version display * refactor: Cleanup * feat: Add version mismatch detection and manual app reload * refactor: replace dynamic imports with static * refactor: Cleanup * feat: Add safe space for `pwa-<size>.png` rendered icons * fix: use relative paths for PWA assets to support base path deployment * feat: add PWA mode detection via URL query parameter * feat: Use ?cache=true for SW-cached PWA assets * refactor: Build process cleanup * refactor: Decouple PWA versioning and remove ?cache=true workaround * chore: Update README logo * feat: Include PWA Assets generation in build script * refactor: `usePwa` hook for core layout * fix: Relativize base vite plugin * fix: remove unnecessary backslash escapes in test regexes * test: update static asset paths for API Key test * refactor: Move SvelteKit PWA Options config to constants * ui: fix update notification never appearing Keep the PWA hook object intact instead of destructuring needRefreshByStorage, which freezes the reactive getter. Also exclude loading.html from PWA precache to prevent 404 errors and broken SW installation.	2026-06-12 15:53:26 +02:00
Georgi Gerganov	f532be8fac	sync : ggml	2026-06-12 15:55:35 +03:00
Adrien Gallouët	70b54e140c	vendor : update cpp-httplib to 0.47.0 (#24395 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-06-12 11:34:44 +02:00
Georgi Gerganov	263cc04a54	sync : ggml	2026-06-11 19:34:19 +03:00
Georgi Gerganov	c2b1518fd4	sync : ggml	2026-06-08 14:31:33 +03:00
Pascal	2016bf2b3b	ui: run npm install when package-lock.json is newer than node_modules (#24171 )	2026-06-05 14:57:32 +02:00
Max Krasnyansky	5c394fdc8b	hexagon: profiler output fix and script updates (#24042 ) * hex-ops: fix profiler output (ie remove the redundant NONEs) * hex-prof: update profiling script to support tot.usec column	2026-06-02 14:08:29 -07:00
Adrien Gallouët	335abed17d	vendor : update cpp-httplib to 0.46.1 (#23980 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-06-01 19:40:10 +03:00
Ruixiang Wang	689a9a470e	server-bench : add speed-bench for speculative decoding benchmarking (#23869 ) * spec: add speed-bench support for benchmarking * speed-bench : add trailing newline to requirements.txt * speed-bench : bump datasets to 4.8.0 to fix ty check * server-bench : remove now-unused type: ignore after datasets bump	2026-05-29 23:09:47 +02:00
Georgi Gerganov	fe12e422ad	sync : ggml	2026-05-29 09:56:08 +03:00
Max Krasnyansky	19e92c33ef	hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case.	2026-05-28 14:05:54 -07:00
Max Krasnyansky	aa50b2c2ae	hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647 ) * hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now * hmx-mm: add support for Q4_1 * hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot * hexagon: fix repack scratch buffer overflow * hex-mm: fix Q4_1 repack buffer sizing * hexagon: flip the build order for mm and fa (seems to help LTO) * hex-mm: add vec_dot 4x1s and minor HMX cleanup after adding Q4_1 * hex-mm: fix fp16 vec_dot fallback to 2x1 and another issue that could cause incorrect output * hexagon: resurrect early-wake and add support for polling for op-batch completions With Q4_1 ggml-hexagon now claims pretty much the entire graphs which gives the CPU more time to chilax. This is a good thing! But it does add extra latency for the pure benchmark runs. Early wakeup helps recover the latency a bit in the normals runs and op-batch polling is just for benchmarking. --------- Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>	2026-05-27 10:46:11 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	617255d437	vendor : update cpp-httplib to 0.46.0 (#23650 )	2026-05-27 21:36:24 +08:00
Georgi Gerganov	d161ea7071	sync : ggml	2026-05-25 12:43:27 +03:00
Georgi Gerganov	22307b3e8b	sync : ggml	2026-05-25 12:38:01 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	9627d0f540	vendor : update cpp-httplib to 0.45.1 (#23639 )	2026-05-25 09:45:22 +03:00
Aparna M P	cec51c7a7d	snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552 )	2026-05-23 19:56:41 -07:00
Aldehir Rojas	b22ff4b7b4	cmake/ui : refactor the build (#23352 )	2026-05-23 17:08:22 -04:00
Max Krasnyansky	c9872a2575	hexagon: HMX quantized matmul rework (#23368 ) * hmx-mm: update debug logging in hmx-mm * hmx-mm: update dequant logic to use HVX_vector_x2/4 * hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version * hmx-mm: use activation depth mode and update naming Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com> * hex-mm: minor hmx matmul naming updates * hmx-mm: remove unused vars * snapdragon: scripts bump default ubatch-size to 1K * hexagon: combine HMX and power and clock settings into a single set_power call * hmx-mm: remove leftover of the scale repl helper * hexagon: fix editconf error --------- Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>	2026-05-20 07:39:01 -07:00
Georgi Gerganov	c3f95c1f06	scripts : allow wc2wt with an existing branch (#23189 )	2026-05-18 08:57:28 +03:00
Georgi Gerganov	3a92bc99db	sync : ggml	2026-05-16 16:11:29 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	18675b6bbc	vendor : update cpp-httplib to 0.45.0 (#23103 )	2026-05-16 15:25:21 +03:00
Aleksander Grygier	59778f0196	ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llama-ui` / `LLAMA_UI` naming (#23064 ) * webui: Move static build output from `tools/server/public` to `build/ui` directory * refactor: Move to `tools/ui` * refactor: rename CMake variables and preprocessor defines - Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED * refactor: rename CLI flags (--webui -> --ui) with backward compat - Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals * refactor: update C++ server internals with backward compat - Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy \|\| params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) \|\| defined(LLAMA_BUILD_WEBUI) * refactor: rename CI/CD workflows, artifacts, and build script - Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label * docs: update CODEOWNERS and server README docs - Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/ * fix: Small fixes for UI build * fix: CMake.txt syntax * chore: Formatting * fix: `.editorconfig` for llama-ui * chore: Formatting * refactor: Use `APP_NAME` in Error route * refactor: Cleanup * refactor: Single migration service * make llama-ui a linkable target * fix: UI Build output * fix: Missing change * fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI * refactor: UI workflows cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-16 02:02:40 +02:00
Omer Ozarslan	1348f67c58	webui: Use lowercase hash for HF checksum check (#23107 )	2026-05-15 19:38:16 +02:00
Zack Li	d81e63dcfd	CI : support IOT device (IQ9) (#22987 ) * update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag	2026-05-14 13:58:34 -07:00
Aleksander Grygier	0c3e4fccca	fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051 ) * fix: Propagate version tag to WebUI asset download in self-hosted CI * refactor: Apply suggestions from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: Skip npm build when Node.js is not installed Avoid 'no such file or directory' errors on CI runners that lack Node.js. Check if npm is available via find_program before attempting npm install + npm run build. Falls back to HF Bucket download. * fix: Use + separator for ASSETS list to fix Windows build Replace fragile \; escaping with a + separator when passing the WebUI asset list via -DASSETS to the download script. On Windows, the \; escaping was not reliably preserved through the CMake build system, causing all asset filenames to be concatenated into one (e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single file), which broke the HF Bucket download and subsequent xxd.cmake step. + is safe because it is not special in cmd.exe (unlike \| which is a pipe operator), not special in CMake's -D argument parser, and not a valid Windows filename character. CMakeLists.txt joins assets with + and webui-download.cmake splits them back via regex. * fix: Validate HF_WEBUI_VERSION environment variable with regex Add input validation for the HF_WEBUI_VERSION env var to prevent CMake list separator or path-traversal issues in stamp filenames and download URLs. Rejects non-conforming characters early. * fix: Remove 'latest' fallback for HF_WEBUI_VERSION When needs.determine-tag.outputs.tag_name is empty, let CMake's default resolution handle it (empty -> git-based version lookup) instead of falling back to 'latest'. This ensures the sentinel stamp file is consistent with CMake's resolution logic. * fix: Demote checksum verification failure to warning instead of hard gate * fix: End line character --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 17:57:20 +02:00
Aleksander Grygier	253ba110bc	webui: Move static build output from repo code to HF Bucket (#22937 ) * ci: add workflow to publish webui to Hugging Face bucket * ci: add webui release job to release workflow * ci: test webui release job * chore: Return to default minification strategy for build output files * ci: extract webui build into separate workflow and job * chore: Ignore webui static output + clean up references * chore: Delete legacy webui static output * chore: Ignore webui build static output * fix: Workflow * fix: Versioning naming * chore: Update package name * test: Test CI fix * refactor: Naming * server: implement webui build strategy with HF Bucket support * chore: Remove test workflow * chore: Use WebUI build workflow call in other workflows * server: HF Buckets fallback for WebUI build * refactor: App name variable * refactor: Naming * fix: Retrieve loading.html * fix: workflow syntax * fix: Rewrite malformed release.yml * fix: Req param * test: Re-add missing Playwright installation for CI tests * refactor: Logic & security improvements * refactor: Retrieve publishing jobs and DRY the workflows * fix: Test workflow syntax * fix: Upstream Release Tag for test workflow * chore: Remove test workflow * ci: Run WebUI jobs on `ubuntu-24.04-arm` * refactor: Post-CR cleanup Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * refactor: CI cleanup * refactor: Cleanup * test: Test workflow * refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads * server: add fallback mechanism for HF Bucket webui downloads from latest directory * fix: Incorrect argument order in file(SHA256) calls for checksum verification * refactor: Use cmake script for handling the HF Bucket download on build time * feat: support local npm build for WebUI assets * refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning * refactor: Cleanup * chore: Remove test workflow * fix: remove s390x from release workflow * fix: add webui-build dependency to ubuntu-22-rocm and windows-hip * Revert "fix: remove s390x from release workflow" This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19. * fix: Release workflow file * fix: Proper release tag used for HF Bucket upload * fix: Remove duplicate steps in release workflow --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 13:21:41 +02:00
Trivikram Reddy	856c3adac1	hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993 ) * hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase * hmx-mm: optimize per-group scale handling * hmx-fa: optimize slope load from vtcm * hmx-fa: use aligned access where possible in hmx-utils * hexagon: add hvx_vec_repl_2x_f16 helper and consolidate repl helpers --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-05-12 17:28:02 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	838374375c	vendor : update cpp-httplib to 0.44.0 (#22919 )	2026-05-11 08:47:13 +02:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	5d5d2e15d2	vendor : update cpp-httplib to 0.43.4 (#22888 )	2026-05-10 18:46:54 +02:00
Georgi Gerganov	0b047287fe	sync : ggml	2026-05-10 17:00:11 +03:00
Georgi Gerganov	70a8309114	sync : ggml	2026-05-05 13:15:59 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	a09a00e502	vendor : update cpp-httplib to 0.43.3 (#22686 )	2026-05-05 09:04:57 +02:00
Piotr Wilkin (ilintar)	a4701c98f7	common/autoparser: fixes for newline handling / forced tool calls (#22654 ) * chat/autoparser: the fixes * Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls. * Trim whitespace on apply instead	2026-05-04 13:18:11 +02:00
Georgi Gerganov	228e836344	sync : ggml	2026-05-02 08:55:29 +03:00
Georgi Gerganov	457e2288c9	sync : ggml	2026-05-02 07:22:35 +03:00
Sigbjørn Skjæret	6118c043b1	ci : bump ty to 0.0.33 (#22535 ) * bump ty to 0.0.33 * update typings	2026-04-30 16:15:54 +03:00
Adrien Gallouët	5f0ab726f7	vendor : update cpp-httplib to 0.43.2 (#22548 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-04-30 15:04:39 +02:00
Georgi Gerganov	27aef3dd91	scripts : add wc2wt.sh - create worktree from current HEAD (#22513 ) * scripts : add wc2wt.sh - create worktree from current HEAD Add a script to create a git worktree on a new branch from the current HEAD. Similar to pr2wt.sh but for local development branches instead of PRs. Usage: ./scripts/wc2wt.sh gg/new-feature ./scripts/wc2wt.sh gg/new-feature "bash -l" Assisted-by: llama.cpp:local pi * cont : no need to try to delete the branch	2026-04-30 09:20:26 +03:00
Max Krasnyansky	41a63be28e	hexagon: make vmem and buffer-size configurable (#22487 ) * hexagon: allow host to set max vmem size We use a sane default but it's helpful to allow for an override if needed. * hexagon: add support for measuring vmem space and move pinned mmaping management to host * hexagon: update vmem checks to use uint64 * hexagon: bump op buffers to 16 (matches max mmaps) * hexagon: bump default vmem to 3.2GB * hexagon: add support for autodetecting vmem space and some logging cleanup in that area * hexagon: fix whitespace warnings * Update scripts/snapdragon/adb/run-cli.sh Co-authored-by: Pascal <admin@serveurperso.com> * hex-adb: fix run-completion script --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-04-29 11:51:21 -07:00
Georgi Gerganov	b1d5f5b449	sync : ggml	2026-04-29 16:43:47 +03:00
Georgi Gerganov	f535774325	pr2wt : symlink .pi (#22386 )	2026-04-26 19:49:26 +03:00
Piotr Wilkin (ilintar)	0adede866d	parser: fix structured output bug (#22302 ) * fix very stupid structured output bug * Things just cannot be too easy.	2026-04-24 23:19:55 +02:00
Max Krasnyansky	5d2b52d80d	hexagon: add support for basic and extended Op profiling (#22269 ) * hexagon: restore HTP_OPMASK_QUEUE * hexagon: honor OPMASK_SKIP_COMPUTE in hmx-matmul * hex-prof: restore op profiling * hex-prof: enable PMU * hexagon: simplify and improve op-queuing with full profiling support Add separate profile descriptors. * hexagon: remove opsync and rename opmask into opstage opsync is no longer needed since the profiler is fully async now. opmask name was confusing and opstage is more accurate. * hexagon: refactor opbatch queue handling * hexagon: add iface hooks for enabling profiler from the host Also move all the PMU setup stuff out of the hex-utils since it's not inteded for normal use. * hexagon: make profiler mode configurable On older devices getting PMU counters is expensive so it's now optional. * hexagon: add support for setting profiler pmu events from env * hexagon: simplify profiler output (no need to print buffs, etc) * hexagon: simplify pmu counter formating * hexagon: add a simple profile post-proc tool * hex-prof: add support for reading logs from stdin * hexagon: document GGML_HEXAGON_PROFILE * hex-prof: update default width for dims field * hex-prof: fix linter warnings and errors * Update ggml/src/ggml-hexagon/htp/htp-ops.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/snapdragon/ggml-hexagon-profile.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-23 14:17:21 -07:00
Shreya Jain	187a456370	Enable testing on Snapdragon devices (#21051 ) * Add the tests that we want to run on external CI * remove extra files * Fixes python issues, reove the deadlock on CI * remove unecessary changes * use override to ty.toml * fix pre-commit and try tests with secret in external repo not upstream * skip if key is unavailable * Fix feedback * switch hexagon to snapdragon * cleanup * fix secrets * remove the copyrights at the top of the files	2026-04-23 13:08:10 -07:00
Piotr Wilkin (ilintar)	8bccdbbff9	chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs (#22217 ) * chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs * Fix ty errors. * Fix flake8 err	2026-04-22 18:10:56 +02:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	606fa42f5d	vendor : update cpp-httplib to 0.43.1 (#22143 ) * vendor : update cpp-httplib to 0.43.0 * vendor : update cpp-httplib to 0.43.0	2026-04-21 22:45:48 +08:00
Georgi Gerganov	4889afba5f	sync : ggml	2026-04-21 11:04:21 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	e365e658f0	vendor : update cpp-httplib to 0.42.0 (#21781 )	2026-04-20 06:41:43 +08:00
Max Krasnyansky	9aa2807769	hexagon: improved Op queuing, buffer and cache management (#21705 ) * hexagon: introduce op request batching and rewrite buffer managment The host now prepares batches of requests and dispatches them via a single dspqueue message. Buffers are mapped explicitly by NPU while processing batches. * hex-dma: disable l2 bypass since to work around new issue due to no flushes between Ops * hex-utils: add explicit l2flush and l2clear helpers * hex-opreq: use fine-grain per tensor l2 management * hex-opreq: avoid redundant invalidates for tensors we already flushed * hex-opreq: update debug messages * htp-opreq: reuse ops_context * hex-opreq: do not flush or invalidate cache lines beyond buffer boundry * hex-opreq: fix errors in log message * Revert "hex-opreq: do not flush or invalidate cache lines beyond buffer boundry" This reverts commit 8b7f0a55a750a6430ce4eb1874c7feb3d720056d. * hexagon: limit l2 flushes to 1MB which covers l2 cache * hex-opreq: limit cache flush to 4MB Looks like 4MB cont. vitual space should cover the 1MB cache. * hexagon: drop cache flush size to 2MB * hex-opreq: start reworking opreq packing * hex-opreq: introduce new way of packing opbatch where tensors are stored separately * hex-opreq: add a simple fastrpc call to force unmap all buffers * hex-l2flush: somehow 2MB does not seem robust, also cleanup step size to use line-size * hex-opreq: bump opreq batch size to 256 * hex-mm: place src1 spad at the top of vtcm for easy reuse * hex-ops: introduce internal types and disable src1 reuse for now Nothing new just formalizing the repack / qyn.quant types we've been using. * htp-opreq: use tensor pointers instead of copies * hex-opreq: introduce more robust way for tracking vtcm/spad reuse This removes the SKIP_QUANTIZE flag that became fragile with the addition of HMX and other ops. * hex-cumsum: fix error post opreq merge * hex-opreq: move request batch handling into the session Prepping everything for using dspqueue buffers and doing that inside the session is much cleaner. * hex-mm: yet another fix for src1 reuse when we're mixing hmx/hvx * hex-bufs: introduce pinned mmapings and use non-pinned ones for model buffers * hex-buf: add support for allocating shared/pinned buffer for opreqs * hex-opbatch: make opbatches configurable * hex-naming: better name for ggml_hexagon_shared_buffer * hex-naming: add session->c_name() helper * hex-opbatch: start using shm but still copy for now * hex-opbatch: use shared buffer for packing opbatch * hex-opbatch: beter naming for opbatch related classes and code * hex-opbatch: reuse batched tensors with same data/dims/strides * hex-opbatch: update logging * hex-opbatch: add support for vmem limit for op batching * hex-opbatch: update htp side to properly support dynamic mmap/unmap * hex-opbatch: add OB and OQ params for run-completion script and fix the asserts in batch processing * hex-opbatch: fixed src1 handling in act ops * hex-act: fix empty src1 handling in swiglu and friends Simplify preamble macro while at it * hex-mm: minor fix vtcm and dma handling in matmul cleaning up some left-overs from merges * hex-opbatch: allocate extra 1KB for dspqueue overhead * hexagon: fix softmax for non-aligned tensors and cleanup vtcm alloc * hex-mm: properly handle hmx_disabled flag * hex-ops: update comments * hex-ops: add debug output for get/set-rows * hex-mmap: optimize un/mapping of buffers * hex-opreq: global cache flush and invalidate beyond 128KB threshold * hex-ops: add super simple opfilter regex for debugging If an Op matches the regex hex backend will reject it. * hex-opbatch: wireup newer ops missed in merge and update main switch to detect this in future * hexagon: improved vtcm acquision to remove inter-op overhead Fully compatible with QNN-HTP coex * hex-mm: fixed hvx fallback path * hex-mm: lower the vmem threshold a bit further to ~3GB * hexagon: update debug & error logs This also fixes an issue with newer llvm merging repack and non-repack functions. We use those pointer to distinguish between buffer types. * hexagon: move ops context into main context Just a cleanup. We don't need separate contexts at this point. * hex-opbatch: cleanup naming and headers for opbatch and related descriptors * hex-fa: it's now better to enable FA during TG to reduce graph splits * hexagon: remove GGML_HEXAGON_EXPERIMENTAL env var It's no longer useful. Please use more flexible GGML_HEXAGON_OPFILTER to disable Ops if needed for debugging or validation. * hexagon: fixed editorconfig check * Update ggml/src/ggml-hexagon/ggml-hexagon.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-10 15:47:43 -07:00

1 2 3 4 5 ...

402 Commits