Commit Graph

409 Commits

Author SHA1 Message Date
Alessandro de Oliveira Faria (A.K.A.CABELO) 0d2d9ccbf6 vendor : update cpp-httplib to 0.48.0 (#24787) 2026-06-19 22:16:35 +08:00
Georgi Gerganov 5fd2dc2c41 sync : ggml 2026-06-19 10:19:14 +03:00
o7si 32eddaf2ea cmake : fix ui build with read-only source (#24752) 2026-06-18 18:59:18 +02:00
Max Krasnyansky d2c67959b3 hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events) (#24592)
* hex-optrace: add support for optrace and instrument matmul and flash-atten code

* hex-trace: improve trace event and prefetto generator

* hex-trace: add new script dedicated to handling traces, specifically perfetto traces

* hex-trace: add --head/--tail options to profile and trace tools

* hex-trace: fix whitespaces

* hex-trace: fix flake8 warnings

* hex-trace: fix flake8 warnings

* hmx-fa: restore q_tiles clearing

* hex-profile: remove circular dep in includes

* hex-trace: simplify trace sizing check

* hex-profile: sort events in the summary by name
2026-06-18 08:35:02 -07:00
Xuan-Son Nguyen e8067a8b36 ui: build-time gzip compression (#24571)
* ui: keep original file name and path

* fix nocache

* ui: build-time gzip compression
2026-06-13 16:57:27 +02:00
Xuan-Son Nguyen 597b6672e8 ui: keep original file name and path (#24568)
* ui: keep original file name and path

* fix nocache
2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen 57fe1f07c3 server: clean up static assets handling (#24550)
* server: clean up static assets handling

* nits

* simplify file name handling, use static file name everywhere

* cmake/ui : bundle UI assets in an archive

* ui : run prettier on post-build.js

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
2026-06-13 11:51:20 +02:00
Aleksander Grygier f7ca93d12c ui: PWA support (#23871)
* feat: Add basic PWA support and service worker for offline caching

* feat: Vite PWA implementation WIP

* feat: Improve PWA icons generation

* feat: Add PWA workbox to server routes

* feat: Include `version.json` in static assets

* feat: Add HTTP cache headers for PWA static assets

* feat: Update app name for `apple-mobile-web-app-title`

* feat: Implement PWA versioning and automatic update detection

* chore: Update `.gitignore` files

* feat: Splash Screens

* feat: Add dark mode favicon support

* refactor: Cleanup

* fix: Use dark logo for dark splash screens

* refactor: Simplify favicons SVG code

* fix: Adjust caching and polling for reliable service worker updates

* fix: Add missing favicon entry

* fix: Align PWA service worker configuration with SvelteKit build structure

* fix: Replace hashed bundle paths with versioned static paths

* test: Add PWA tests

* ci: Add build output for unit tests

* refactor: Cleanup

* fix: Server build & release versioning

* chore: Update package-lock.json

* chore: Increase PWA cache size

* chore: Update packages

* feat: Update favicons

* refactor: Post-merge fix

* feat: support explicit build version for PWA cache busting

* fix: CI

* feat: Improve PWA Refresh Alert UI

* feat: Add toggleable build version display

* refactor: Cleanup

* feat: Add version mismatch detection and manual app reload

* refactor: replace dynamic imports with static

* refactor: Cleanup

* feat: Add safe space for `pwa-<size>.png` rendered icons

* fix: use relative paths for PWA assets to support base path deployment

* feat: add PWA mode detection via URL query parameter

* feat: Use ?cache=true for SW-cached PWA assets

* refactor: Build process cleanup

* refactor: Decouple PWA versioning and remove ?cache=true workaround

* chore: Update README logo

* feat: Include PWA Assets generation in build script

* refactor: `usePwa` hook for core layout

* fix: Relativize base vite plugin

* fix: remove unnecessary backslash escapes in test regexes

* test: update static asset paths for API Key test

* refactor: Move SvelteKit PWA Options config to constants

* ui: fix update notification never appearing

Keep the PWA hook object intact instead of destructuring needRefreshByStorage,
which freezes the reactive getter. Also exclude loading.html from PWA
precache to prevent 404 errors and broken SW installation.
2026-06-12 15:53:26 +02:00
Georgi Gerganov f532be8fac sync : ggml 2026-06-12 15:55:35 +03:00
Adrien Gallouët 70b54e140c vendor : update cpp-httplib to 0.47.0 (#24395)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-06-12 11:34:44 +02:00
Georgi Gerganov 263cc04a54 sync : ggml 2026-06-11 19:34:19 +03:00
Georgi Gerganov c2b1518fd4 sync : ggml 2026-06-08 14:31:33 +03:00
Pascal 2016bf2b3b ui: run npm install when package-lock.json is newer than node_modules (#24171) 2026-06-05 14:57:32 +02:00
Max Krasnyansky 5c394fdc8b hexagon: profiler output fix and script updates (#24042)
* hex-ops: fix profiler output (ie remove the redundant NONEs)

* hex-prof: update profiling script to support tot.usec column
2026-06-02 14:08:29 -07:00
Adrien Gallouët 335abed17d vendor : update cpp-httplib to 0.46.1 (#23980)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-06-01 19:40:10 +03:00
Ruixiang Wang 689a9a470e server-bench : add speed-bench for speculative decoding benchmarking (#23869)
* spec: add speed-bench support for benchmarking

* speed-bench : add trailing newline to requirements.txt

* speed-bench : bump datasets to 4.8.0 to fix ty check

* server-bench : remove now-unused type: ignore after datasets bump
2026-05-29 23:09:47 +02:00
Georgi Gerganov fe12e422ad sync : ggml 2026-05-29 09:56:08 +03:00
Max Krasnyansky 19e92c33ef hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23835)
Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case.
2026-05-28 14:05:54 -07:00
Max Krasnyansky aa50b2c2ae hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647)
* hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now

* hmx-mm: add support for Q4_1

* hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot

* hexagon: fix repack scratch buffer overflow

* hex-mm: fix Q4_1 repack buffer sizing

* hexagon: flip the build order for mm and fa (seems to help LTO)

* hex-mm: add vec_dot 4x1s and minor HMX cleanup after adding Q4_1

* hex-mm: fix fp16 vec_dot fallback to 2x1 and another issue that could cause incorrect output

* hexagon: resurrect early-wake and add support for polling for op-batch completions

With Q4_1 ggml-hexagon now claims pretty much the entire graphs which gives the CPU more time to chilax.
This is a good thing! But it does add extra latency for the pure benchmark runs.
Early wakeup helps recover the latency a bit in the normals runs and op-batch polling is just for benchmarking.

---------

Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>
2026-05-27 10:46:11 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 617255d437 vendor : update cpp-httplib to 0.46.0 (#23650) 2026-05-27 21:36:24 +08:00
Georgi Gerganov d161ea7071 sync : ggml 2026-05-25 12:43:27 +03:00
Georgi Gerganov 22307b3e8b sync : ggml 2026-05-25 12:38:01 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 9627d0f540 vendor : update cpp-httplib to 0.45.1 (#23639) 2026-05-25 09:45:22 +03:00
Aparna M P cec51c7a7d snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552) 2026-05-23 19:56:41 -07:00
Aldehir Rojas b22ff4b7b4 cmake/ui : refactor the build (#23352) 2026-05-23 17:08:22 -04:00
Max Krasnyansky c9872a2575 hexagon: HMX quantized matmul rework (#23368)
* hmx-mm: update debug logging in hmx-mm

* hmx-mm: update dequant logic to use HVX_vector_x2/4

* hmx-mm: remove non-pipelined version of the quantize matmul

It seems that we don't reall need non-pipelined version

* hmx-mm: use activation depth mode and update naming

Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>

* hex-mm: minor hmx matmul naming updates

* hmx-mm: remove unused vars

* snapdragon: scripts bump default ubatch-size to 1K

* hexagon: combine HMX and power and clock settings into a single set_power call

* hmx-mm: remove leftover of the scale repl helper

* hexagon: fix editconf error

---------

Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com>
2026-05-20 07:39:01 -07:00
Georgi Gerganov c3f95c1f06 scripts : allow wc2wt with an existing branch (#23189) 2026-05-18 08:57:28 +03:00
Georgi Gerganov 3a92bc99db sync : ggml 2026-05-16 16:11:29 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 18675b6bbc vendor : update cpp-httplib to 0.45.0 (#23103) 2026-05-16 15:25:21 +03:00
Aleksander Grygier 59778f0196 ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064)
* webui: Move static build output from `tools/server/public` to `build/ui` directory

* refactor: Move to `tools/ui`

* refactor: rename CMake variables and preprocessor defines

- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated)
- Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated)
- Backward compat: old vars auto-forward to new ones with DEPRECATION warning
- Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc.
- Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET
- Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines
- Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED

* refactor: rename CLI flags (--webui -> --ui) with backward compat

- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases)
- Add --ui-config (old --webui-config kept as deprecated alias)
- Add --ui-config-file (old --webui-config-file kept as deprecated alias)
- Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated)
- Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY
- C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields
- Backward compat: old fields synced to new ones in g_params_to_internals

* refactor: update C++ server internals with backward compat

- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta)
- Rename params.webui usage -> params.ui (both synced, old still works)
- JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys
- Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy
- Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)

* refactor: rename CI/CD workflows, artifacts, and build script

- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build
- Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT
- Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks
- Update server.yml: job/artifact refs webui-build -> ui-build
- Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT
- Update server-self-hosted.yml: webui-build -> ui-build
- Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION
- Rename webui-download.cmake -> ui-download.cmake (internal refs updated)
- Update labeler.yml: server/webui -> server/ui path label

* docs: update CODEOWNERS and server README docs

- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/
- Update server README.md: CLI tables show --ui flags with deprecated --webui aliases
- Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/

* fix: Small fixes for UI build

* fix: CMake.txt syntax

* chore: Formatting

* fix: `.editorconfig` for llama-ui

* chore: Formatting

* refactor: Use `APP_NAME` in Error route

* refactor: Cleanup

* refactor: Single migration service

* make llama-ui a linkable target

* fix: UI Build output

* fix: Missing change

* fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI

* refactor: UI workflows cleanup

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-05-16 02:02:40 +02:00
Omer Ozarslan 1348f67c58 webui: Use lowercase hash for HF checksum check (#23107) 2026-05-15 19:38:16 +02:00
Zack Li d81e63dcfd CI : support IOT device (IQ9) (#22987)
* update test scripts

* align CI behavior between linux and android

* remove automatically cancel in 15min

* enable cancel-in-progress

* fix ty check issue

* update and fix pylint issue

* update runner such that we are not restricted by the 15min limit rule

* fix flake8 lint issue

* update runner according to review feedback

* code update according to review feedback

* switch from llama-cli to llama-completion binary with -no-cnv flag
2026-05-14 13:58:34 -07:00
Aleksander Grygier 0c3e4fccca fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051)
* fix: Propagate version tag to WebUI asset download in self-hosted CI

* refactor: Apply suggestions from @CISC

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix: Skip npm build when Node.js is not installed

Avoid 'no such file or directory' errors on CI runners that lack
Node.js. Check if npm is available via find_program before attempting
npm install + npm run build. Falls back to HF Bucket download.

* fix: Use + separator for ASSETS list to fix Windows build

Replace fragile \; escaping with a + separator when passing the
WebUI asset list via -DASSETS to the download script. On Windows,
the \; escaping was not reliably preserved through the CMake build
system, causing all asset filenames to be concatenated into one
(e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single
file), which broke the HF Bucket download and subsequent xxd.cmake
step.

+ is safe because it is not special in cmd.exe (unlike | which is a
pipe operator), not special in CMake's -D argument parser, and not
a valid Windows filename character. CMakeLists.txt joins assets
with + and webui-download.cmake splits them back via regex.

* fix: Validate HF_WEBUI_VERSION environment variable with regex

Add input validation for the HF_WEBUI_VERSION env var to prevent
CMake list separator or path-traversal issues in stamp filenames
and download URLs. Rejects non-conforming characters early.

* fix: Remove 'latest' fallback for HF_WEBUI_VERSION

When needs.determine-tag.outputs.tag_name is empty, let CMake's
default resolution handle it (empty -> git-based version lookup)
instead of falling back to 'latest'. This ensures the sentinel
stamp file is consistent with CMake's resolution logic.

* fix: Demote checksum verification failure to warning instead of hard gate

* fix: End line character

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 17:57:20 +02:00
Aleksander Grygier 253ba110bc webui: Move static build output from repo code to HF Bucket (#22937)
* ci: add workflow to publish webui to Hugging Face bucket

* ci: add webui release job to release workflow

* ci: test webui release job

* chore: Return to default minification strategy for build output files

* ci: extract webui build into separate workflow and job

* chore: Ignore webui static output + clean up references

* chore: Delete legacy webui static output

* chore: Ignore webui build static output

* fix: Workflow

* fix: Versioning naming

* chore: Update package name

* test: Test CI fix

* refactor: Naming

* server: implement webui build strategy with HF Bucket support

* chore: Remove test workflow

* chore: Use WebUI build workflow call in other workflows

* server: HF Buckets fallback for WebUI build

* refactor: App name variable

* refactor: Naming

* fix: Retrieve loading.html

* fix: workflow syntax

* fix: Rewrite malformed release.yml

* fix: Req param

* test: Re-add missing Playwright installation for CI tests

* refactor: Logic & security improvements

* refactor: Retrieve publishing jobs and DRY the workflows

* fix: Test workflow syntax

* fix: Upstream Release Tag for test workflow

* chore: Remove test workflow

* ci: Run WebUI jobs on `ubuntu-24.04-arm`

* refactor: Post-CR cleanup

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* refactor: CI cleanup

* refactor: Cleanup

* test: Test workflow

* refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads

* server: add fallback mechanism for HF Bucket webui downloads from latest directory

* fix: Incorrect argument order in file(SHA256) calls for checksum verification

* refactor: Use cmake script for handling the HF Bucket download on build time

* feat: support local npm build for WebUI assets

* refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning

* refactor: Cleanup

* chore: Remove test workflow

* fix: remove s390x from release workflow

* fix: add webui-build dependency to ubuntu-22-rocm and windows-hip

* Revert "fix: remove s390x from release workflow"

This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19.

* fix: Release workflow file

* fix: Proper release tag used for HF Bucket upload

* fix: Remove duplicate steps in release workflow

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 13:21:41 +02:00
Trivikram Reddy 856c3adac1 hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)
* hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase

* hmx-mm: optimize per-group scale handling

* hmx-fa: optimize slope load from vtcm

* hmx-fa: use aligned access where possible in hmx-utils

* hexagon: add hvx_vec_repl_2x_f16 helper and consolidate repl helpers

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-05-12 17:28:02 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 838374375c vendor : update cpp-httplib to 0.44.0 (#22919) 2026-05-11 08:47:13 +02:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 5d5d2e15d2 vendor : update cpp-httplib to 0.43.4 (#22888) 2026-05-10 18:46:54 +02:00
Georgi Gerganov 0b047287fe sync : ggml 2026-05-10 17:00:11 +03:00
Georgi Gerganov 70a8309114 sync : ggml 2026-05-05 13:15:59 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO) a09a00e502 vendor : update cpp-httplib to 0.43.3 (#22686) 2026-05-05 09:04:57 +02:00
Piotr Wilkin (ilintar) a4701c98f7 common/autoparser: fixes for newline handling / forced tool calls (#22654)
* chat/autoparser: the fixes

* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.

* Trim whitespace on apply instead
2026-05-04 13:18:11 +02:00
Georgi Gerganov 228e836344 sync : ggml 2026-05-02 08:55:29 +03:00
Georgi Gerganov 457e2288c9 sync : ggml 2026-05-02 07:22:35 +03:00
Sigbjørn Skjæret 6118c043b1 ci : bump ty to 0.0.33 (#22535)
* bump ty to 0.0.33

* update typings
2026-04-30 16:15:54 +03:00
Adrien Gallouët 5f0ab726f7 vendor : update cpp-httplib to 0.43.2 (#22548)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-04-30 15:04:39 +02:00
Georgi Gerganov 27aef3dd91 scripts : add wc2wt.sh - create worktree from current HEAD (#22513)
* scripts : add wc2wt.sh - create worktree from current HEAD

Add a script to create a git worktree on a new branch from the current
HEAD. Similar to pr2wt.sh but for local development branches instead of
PRs.

Usage:
  ./scripts/wc2wt.sh gg/new-feature
  ./scripts/wc2wt.sh gg/new-feature "bash -l"

Assisted-by: llama.cpp:local pi

* cont : no need to try to delete the branch
2026-04-30 09:20:26 +03:00
Max Krasnyansky 41a63be28e hexagon: make vmem and buffer-size configurable (#22487)
* hexagon: allow host to set max vmem size

We use a sane default but it's helpful to allow for an override if needed.

* hexagon: add support for measuring vmem space and move pinned mmaping management to host

* hexagon: update vmem checks to use uint64

* hexagon: bump op buffers to 16 (matches max mmaps)

* hexagon: bump default vmem to 3.2GB

* hexagon: add support for autodetecting vmem space and some logging cleanup in that area

* hexagon: fix whitespace warnings

* Update scripts/snapdragon/adb/run-cli.sh

Co-authored-by: Pascal <admin@serveurperso.com>

* hex-adb: fix run-completion script

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2026-04-29 11:51:21 -07:00
Georgi Gerganov b1d5f5b449 sync : ggml 2026-04-29 16:43:47 +03:00
Georgi Gerganov f535774325 pr2wt : symlink .pi (#22386) 2026-04-26 19:49:26 +03:00
Piotr Wilkin (ilintar) 0adede866d parser: fix structured output bug (#22302)
* fix very stupid structured output bug

* Things just cannot be too easy.
2026-04-24 23:19:55 +02:00