Xuan-Son Nguyen
4b4d13ae72
server: (router) add model management API ( #23976 )
...
* wip
* server: (router) add SSE realtime updates API
* nits
* wip
* add download API
* add download api
* update docs
* add delete endpoint
* fix std::terminate
* fix crash
* fix 2
* add tests
* nits
2026-06-17 18:04:58 +02:00
Xuan-Son Nguyen
e8067a8b36
ui: build-time gzip compression ( #24571 )
...
* ui: keep original file name and path
* fix nocache
* ui: build-time gzip compression
2026-06-13 16:57:27 +02:00
Xuan-Son Nguyen
597b6672e8
ui: keep original file name and path ( #24568 )
...
* ui: keep original file name and path
* fix nocache
2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen
57fe1f07c3
server: clean up static assets handling ( #24550 )
...
* server: clean up static assets handling
* nits
* simplify file name handling, use static file name everywhere
* cmake/ui : bundle UI assets in an archive
* ui : run prettier on post-build.js
---------
Co-authored-by: Alde Rojas <hello@alde.dev >
2026-06-13 11:51:20 +02:00
Xuan-Son Nguyen
e37abd6b5f
mtmd: add batching API ( #24384 )
...
* mtmd: add batching API
* wip
* first working version (gemma4v)
* add arg
* nits
* wire up support_batch()
* fix 0.0 output embd
* fix audio
* nits
* refactor a bit
* nits
* fix non-batching case
* fix comment
2026-06-13 00:10:29 +02:00
Aleksander Grygier
f7ca93d12c
ui: PWA support ( #23871 )
...
* feat: Add basic PWA support and service worker for offline caching
* feat: Vite PWA implementation WIP
* feat: Improve PWA icons generation
* feat: Add PWA workbox to server routes
* feat: Include `version.json` in static assets
* feat: Add HTTP cache headers for PWA static assets
* feat: Update app name for `apple-mobile-web-app-title`
* feat: Implement PWA versioning and automatic update detection
* chore: Update `.gitignore` files
* feat: Splash Screens
* feat: Add dark mode favicon support
* refactor: Cleanup
* fix: Use dark logo for dark splash screens
* refactor: Simplify favicons SVG code
* fix: Adjust caching and polling for reliable service worker updates
* fix: Add missing favicon entry
* fix: Align PWA service worker configuration with SvelteKit build structure
* fix: Replace hashed bundle paths with versioned static paths
* test: Add PWA tests
* ci: Add build output for unit tests
* refactor: Cleanup
* fix: Server build & release versioning
* chore: Update package-lock.json
* chore: Increase PWA cache size
* chore: Update packages
* feat: Update favicons
* refactor: Post-merge fix
* feat: support explicit build version for PWA cache busting
* fix: CI
* feat: Improve PWA Refresh Alert UI
* feat: Add toggleable build version display
* refactor: Cleanup
* feat: Add version mismatch detection and manual app reload
* refactor: replace dynamic imports with static
* refactor: Cleanup
* feat: Add safe space for `pwa-<size>.png` rendered icons
* fix: use relative paths for PWA assets to support base path deployment
* feat: add PWA mode detection via URL query parameter
* feat: Use ?cache=true for SW-cached PWA assets
* refactor: Build process cleanup
* refactor: Decouple PWA versioning and remove ?cache=true workaround
* chore: Update README logo
* feat: Include PWA Assets generation in build script
* refactor: `usePwa` hook for core layout
* fix: Relativize base vite plugin
* fix: remove unnecessary backslash escapes in test regexes
* test: update static asset paths for API Key test
* refactor: Move SvelteKit PWA Options config to constants
* ui: fix update notification never appearing
Keep the PWA hook object intact instead of destructuring needRefreshByStorage,
which freezes the reactive getter. Also exclude loading.html from PWA
precache to prevent 404 errors and broken SW installation.
2026-06-12 15:53:26 +02:00
Eric Zhang
6f165c1c64
server : handle If-None-Match weak ETags ( #23916 )
2026-05-31 16:21:08 -05:00
Funtowicz Morgan
0b246862b9
server: minor tweaks to use more cpp features ( #23785 )
...
* misc(server): add default port to impl RAII
* misc(server): register_gcp_compat() can be const
* misc(server): use proper cpp const/auto methods
* misc(server): do not reset a unique_ptr, use make_unique instead to be exception safe
2026-05-28 14:00:25 +02:00
Markus Tavenrath
d205df6812
server, ui : Add support for HTTP ETags in llama-server ( #23701 )
...
* allow caching of ui elements in llama-server
* use fnv_hash
* Update tools/server/server-http.cpp
etag has to be set always
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2026-05-28 12:21:24 +02:00
Radoslav Gerganov
7085492c6f
server : fix the log message when using SSL ( #23393 )
...
When llama-server is started with SSL key and cert, the log says that it
listens on http instead of https. This patch fixes this.
2026-05-27 08:06:30 +03:00
Aldehir Rojas
b22ff4b7b4
cmake/ui : refactor the build ( #23352 )
2026-05-23 17:08:22 -04:00
Aldehir Rojas
87589042ca
cmake : fix LLAMA_BUILD_UI logic ( #23190 )
2026-05-17 14:42:26 -04:00
Aleksander Grygier
59778f0196
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 )
...
* webui: Move static build output from `tools/server/public` to `build/ui` directory
* refactor: Move to `tools/ui`
* refactor: rename CMake variables and preprocessor defines
- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated)
- Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated)
- Backward compat: old vars auto-forward to new ones with DEPRECATION warning
- Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc.
- Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET
- Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines
- Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED
* refactor: rename CLI flags (--webui -> --ui) with backward compat
- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases)
- Add --ui-config (old --webui-config kept as deprecated alias)
- Add --ui-config-file (old --webui-config-file kept as deprecated alias)
- Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated)
- Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY
- C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields
- Backward compat: old fields synced to new ones in g_params_to_internals
* refactor: update C++ server internals with backward compat
- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta)
- Rename params.webui usage -> params.ui (both synced, old still works)
- JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys
- Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy
- Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)
* refactor: rename CI/CD workflows, artifacts, and build script
- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build
- Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT
- Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks
- Update server.yml: job/artifact refs webui-build -> ui-build
- Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT
- Update server-self-hosted.yml: webui-build -> ui-build
- Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION
- Rename webui-download.cmake -> ui-download.cmake (internal refs updated)
- Update labeler.yml: server/webui -> server/ui path label
* docs: update CODEOWNERS and server README docs
- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/
- Update server README.md: CLI tables show --ui flags with deprecated --webui aliases
- Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/
* fix: Small fixes for UI build
* fix: CMake.txt syntax
* chore: Formatting
* fix: `.editorconfig` for llama-ui
* chore: Formatting
* refactor: Use `APP_NAME` in Error route
* refactor: Cleanup
* refactor: Single migration service
* make llama-ui a linkable target
* fix: UI Build output
* fix: Missing change
* fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI
* refactor: UI workflows cleanup
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2026-05-16 02:02:40 +02:00
Aleksander Grygier
253ba110bc
webui: Move static build output from repo code to HF Bucket ( #22937 )
...
* ci: add workflow to publish webui to Hugging Face bucket
* ci: add webui release job to release workflow
* ci: test webui release job
* chore: Return to default minification strategy for build output files
* ci: extract webui build into separate workflow and job
* chore: Ignore webui static output + clean up references
* chore: Delete legacy webui static output
* chore: Ignore webui build static output
* fix: Workflow
* fix: Versioning naming
* chore: Update package name
* test: Test CI fix
* refactor: Naming
* server: implement webui build strategy with HF Bucket support
* chore: Remove test workflow
* chore: Use WebUI build workflow call in other workflows
* server: HF Buckets fallback for WebUI build
* refactor: App name variable
* refactor: Naming
* fix: Retrieve loading.html
* fix: workflow syntax
* fix: Rewrite malformed release.yml
* fix: Req param
* test: Re-add missing Playwright installation for CI tests
* refactor: Logic & security improvements
* refactor: Retrieve publishing jobs and DRY the workflows
* fix: Test workflow syntax
* fix: Upstream Release Tag for test workflow
* chore: Remove test workflow
* ci: Run WebUI jobs on `ubuntu-24.04-arm`
* refactor: Post-CR cleanup
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com >
* refactor: CI cleanup
* refactor: Cleanup
* test: Test workflow
* refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads
* server: add fallback mechanism for HF Bucket webui downloads from latest directory
* fix: Incorrect argument order in file(SHA256) calls for checksum verification
* refactor: Use cmake script for handling the HF Bucket download on build time
* feat: support local npm build for WebUI assets
* refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning
* refactor: Cleanup
* chore: Remove test workflow
* fix: remove s390x from release workflow
* fix: add webui-build dependency to ubuntu-22-rocm and windows-hip
* Revert "fix: remove s390x from release workflow"
This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19.
* fix: Release workflow file
* fix: Proper release tag used for HF Bucket upload
* fix: Remove duplicate steps in release workflow
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2026-05-14 13:21:41 +02:00
Georgi Gerganov
67b2b7f2f2
logs : reduce ( #23021 )
...
* logs : reduce
* args : fix envs
* server : fix build
* common : print verbosity level at start
* server : clean-up logs
* server : print prompt processing timings + sampling params
* minor : whitespaces
2026-05-14 13:05:52 +03:00
Xuan-Son Nguyen
29debb3a6a
server: support Vertex AI compatible API ( #22545 )
...
* server: support Vertex AI compatible API
* a bit safer
* support other AIP_* env var
* various fixes
* if AIP_MODE is unset, do nothing
* fix test case
* fix windows build
2026-05-08 15:23:04 +02:00
tha80
983ca8992e
server: (router) Forward form-data to model server ( Fixes #22044 ) ( #22118 )
...
* This commit enables the router to forward form-data to model server.
Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode)
* * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload.
* Addressed Copilots third comment by extending the files representation to also include filename and content-type
* Addressed Copilots fourth comment by making the RNG thread_local
* Changed variable body from std::string to std::ostringstream in build_multipart_body
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053
* Added sanitize_field lambda in build_multipart_body for key, filename and content_type
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647
* explicitly checking if value/item is string before calling value/item.get<std::string>()
as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279
* Added double quote to the sanitize lambda and throw on json parse failure
---------
Co-authored-by: Ralph Paßgang <ralph@trust-it.de >
2026-04-27 23:55:00 +02:00
Georgi Gerganov
cf8b0dbda9
server : remove /api endpoints ( #22165 )
...
* server : remove /api endpoints
* cont : remove /api/tags
2026-04-20 20:41:19 +03:00
Xuan-Son Nguyen
e489a5ca0e
server: support OAI /v1/audio/transcriptions API ( #21863 )
...
* server: support OAI /v1/audio/transcriptions API
* address autoreview comments
* correct default response_format value
2026-04-14 11:09:52 +02:00
lainon1
482d862bcb
server : handle unsuccessful sink.write in chunked stream provider ( #21478 )
...
Check the return value of sink.write() in the chunked content provider
and return false when the write fails, matching cpp-httplib's own
streaming contract. This prevents logging chunks as sent when the sink
rejected them and properly aborts the stream on connection failure.
2026-04-06 14:03:02 +02:00
Aleksander Grygier
12dbf1da95
server: Bypass API Key validation for WebUI static bundle assets ( #21269 )
...
* fix: Bypass API Key validation for static bundle assets
* refactor: All bypassed routes in `public_endpoints`
* test: Update static assets API Key test
2026-04-01 21:32:15 +02:00
Xuan-Son Nguyen
4a00bbfed6
server: (webui) no more gzip compression ( #21073 )
...
* webui: no more gzip
* try changing a small line
* Revert "try changing a small line"
This reverts commit 0d7a353159 .
* fix lint
* fix test
* rebuild
* split into html/css/js
* lint
* chore: update webui build output
* chore: Update git hooks script
* server: update webui build output
* chore: Update pre-commit hook
* refactor: Cleanup
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com >
2026-03-31 15:44:26 +02:00
Adrien Gallouët
b0f0dd3e51
vendor : update cpp-httplib to 0.40.0 ( #21100 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
2026-03-28 08:59:44 +01:00
Adrien Gallouët
5c1a7b8355
server : add custom socket options to disable SO_REUSEPORT ( #21056 )
...
* server : add custom socket options to disable SO_REUSEPORT
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* Add --reuse-port
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 --reuse-port
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* Update tools/server/README.md (llama-gen-docs)
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* Fix windows
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
2026-03-28 01:12:43 +01:00
Kusha Gharahi
ff934e29bc
server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui ( #20158 )
...
* introduce LLAMA_SERVER_NO_WEBUI
* LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI
* LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE
* MIssed this
* Add useWebUi to package.nix
2026-03-27 17:25:55 +01:00
Xuan-Son Nguyen
31a5cf4c3f
server: use httplib dynamic threads ( #20817 )
...
* server: use httplib dynamic threads
* change to n_threads_http + 1024
2026-03-23 12:22:46 +01:00
Pascal
47eb12b953
server: fix query params lost when proxying requests in multi-model router mode ( #19854 )
...
* server: fix query params lost when proxying requests in multi-model router mode
* server: re-encode query params using httplib::encode_query_component in proxy
2026-02-24 21:46:06 +01:00
Xuan-Son Nguyen
4e595b250a
server: do not log certain endpoints (avoid log spam) ( #19028 )
2026-01-22 19:24:37 +01:00
Xuan-Son Nguyen
6ce863c803
server: prevent data race from HTTP threads ( #18263 )
...
* server: prevent data race from HTTP threads
* fix params
* fix default_generation_settings
* nits: make handle_completions_impl looks less strange
* stricter const
* fix GGML_ASSERT(idx < states.size())
* move index to be managed by server_response_reader
* http: make sure req & res lifecycle are tied together
* fix compile
* fix index handling buggy
* fix data race for lora endpoint
* nits: fix shadow variable
* nits: revert redundant changes
* nits: correct naming for json_webui_settings
2025-12-22 14:23:34 +01:00
Fredrik Hultin
ddf9f94389
server : add Anthropic Messages API support ( #17570 )
...
* server : add Anthropic Messages API support
* remove -@pytest.mark.slow from tool calling/jinja tests
* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py
* server : removed redundant n field logic in anthropic_params_from_json
* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()
* server : refactor Anthropic API to use OAI conversion
* make sure basic test always go first
* clean up
* clean up api key check, add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-11-28 12:57:04 +01:00
Xuan-Son Nguyen
b8372eecd9
server: split server.cpp code into server/common/task/queue ( #17362 )
...
* add server-task, server-common
* add server-queue
* rm redundant includes
* move enum stop_type to server-task
* server : headers cleanup
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-11-24 14:41:53 +01:00
o7si
97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error ( #17348 )
2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen
0de8878c96
server: split HTTP into its own interface ( #17216 )
...
* server: split HTTP into its own interface
* move server-http and httplib to its own file
* add the remaining endpoints
* fix exception/error handling
* renaming
* missing header
* fix missing windows header
* fix error responses from http layer
* fix slot save/restore handler
* fix case where only one stream chunk is returned
* add NOMINMAX
* do not call sink.write on empty data
* use safe_json_to_str for SSE
* clean up
* add some comments
* improve usage of next()
* bring back the "server is listening on" message
* more generic handler
* add req.headers
* move the chat template print to init()
* add req.path
* cont : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-11-17 22:05:44 +01:00