* server: clean up static assets handling
* nits
* simplify file name handling, use static file name everywhere
* cmake/ui : bundle UI assets in an archive
* ui : run prettier on post-build.js
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
* hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now
* hmx-mm: add support for Q4_1
* hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot
* hexagon: fix repack scratch buffer overflow
* hex-mm: fix Q4_1 repack buffer sizing
* hexagon: flip the build order for mm and fa (seems to help LTO)
* hex-mm: add vec_dot 4x1s and minor HMX cleanup after adding Q4_1
* hex-mm: fix fp16 vec_dot fallback to 2x1 and another issue that could cause incorrect output
* hexagon: resurrect early-wake and add support for polling for op-batch completions
With Q4_1 ggml-hexagon now claims pretty much the entire graphs which gives the CPU more time to chilax.
This is a good thing! But it does add extra latency for the pure benchmark runs.
Early wakeup helps recover the latency a bit in the normals runs and op-batch polling is just for benchmarking.
---------
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>
* hmx-mm: update debug logging in hmx-mm
* hmx-mm: update dequant logic to use HVX_vector_x2/4
* hmx-mm: remove non-pipelined version of the quantize matmul
It seems that we don't reall need non-pipelined version
* hmx-mm: use activation depth mode and update naming
Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>
* hex-mm: minor hmx matmul naming updates
* hmx-mm: remove unused vars
* snapdragon: scripts bump default ubatch-size to 1K
* hexagon: combine HMX and power and clock settings into a single set_power call
* hmx-mm: remove leftover of the scale repl helper
* hexagon: fix editconf error
---------
Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>
* update test scripts
* align CI behavior between linux and android
* remove automatically cancel in 15min
* enable cancel-in-progress
* fix ty check issue
* update and fix pylint issue
* update runner such that we are not restricted by the 15min limit rule
* fix flake8 lint issue
* update runner according to review feedback
* code update according to review feedback
* switch from llama-cli to llama-completion binary with -no-cnv flag
* fix: Propagate version tag to WebUI asset download in self-hosted CI
* refactor: Apply suggestions from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix: Skip npm build when Node.js is not installed
Avoid 'no such file or directory' errors on CI runners that lack
Node.js. Check if npm is available via find_program before attempting
npm install + npm run build. Falls back to HF Bucket download.
* fix: Use + separator for ASSETS list to fix Windows build
Replace fragile \; escaping with a + separator when passing the
WebUI asset list via -DASSETS to the download script. On Windows,
the \; escaping was not reliably preserved through the CMake build
system, causing all asset filenames to be concatenated into one
(e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single
file), which broke the HF Bucket download and subsequent xxd.cmake
step.
+ is safe because it is not special in cmd.exe (unlike | which is a
pipe operator), not special in CMake's -D argument parser, and not
a valid Windows filename character. CMakeLists.txt joins assets
with + and webui-download.cmake splits them back via regex.
* fix: Validate HF_WEBUI_VERSION environment variable with regex
Add input validation for the HF_WEBUI_VERSION env var to prevent
CMake list separator or path-traversal issues in stamp filenames
and download URLs. Rejects non-conforming characters early.
* fix: Remove 'latest' fallback for HF_WEBUI_VERSION
When needs.determine-tag.outputs.tag_name is empty, let CMake's
default resolution handle it (empty -> git-based version lookup)
instead of falling back to 'latest'. This ensures the sentinel
stamp file is consistent with CMake's resolution logic.
* fix: Demote checksum verification failure to warning instead of hard gate
* fix: End line character
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase
* hmx-mm: optimize per-group scale handling
* hmx-fa: optimize slope load from vtcm
* hmx-fa: use aligned access where possible in hmx-utils
* hexagon: add hvx_vec_repl_2x_f16 helper and consolidate repl helpers
---------
Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
* chat/autoparser: the fixes
* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.
* Trim whitespace on apply instead
* scripts : add wc2wt.sh - create worktree from current HEAD
Add a script to create a git worktree on a new branch from the current
HEAD. Similar to pr2wt.sh but for local development branches instead of
PRs.
Usage:
./scripts/wc2wt.sh gg/new-feature
./scripts/wc2wt.sh gg/new-feature "bash -l"
Assisted-by: llama.cpp:local pi
* cont : no need to try to delete the branch
* hexagon: allow host to set max vmem size
We use a sane default but it's helpful to allow for an override if needed.
* hexagon: add support for measuring vmem space and move pinned mmaping management to host
* hexagon: update vmem checks to use uint64
* hexagon: bump op buffers to 16 (matches max mmaps)
* hexagon: bump default vmem to 3.2GB
* hexagon: add support for autodetecting vmem space and some logging cleanup in that area
* hexagon: fix whitespace warnings
* Update scripts/snapdragon/adb/run-cli.sh
Co-authored-by: Pascal <admin@serveurperso.com>
* hex-adb: fix run-completion script
---------
Co-authored-by: Pascal <admin@serveurperso.com>