Absorb get_slot_by_id logic into get_available_slot so slot selection
is handled by a single function call. When a specific slot id is
requested, the LCP similarity check still runs to enable proper
prompt cache updates.
Assisted-by: pi:llama.cpp/Qwen3.6-27B
* server: add "X-Accel-Buffering": "no" header to streaming endpoints
This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints)
Without it, Nginx will break streaming with certain applications (notably the Pi coding harness).
* ui : add model selector storybook stories
Covers list, favorites, single-model, all status states
(loading/loaded/sleeping/failed/idle), and selection states.
* ui : improve model selector mobile UX with hover media queries
Use @media (hover:none) to show action buttons directly on touch
devices and color-code them by model status (amber=sleeping,
green=loaded, muted=idle). Status dots hidden on touch. Desktop
hover behavior unchanged.
Throw on grammar parse failure so the server returns HTTP 400
instead of silently dropping the constraint.
Add a regression test for the invalid-grammar response.
Fixes#24144
* webui: export conversations as jsonl
each session is one jsonl file, a session header line followed by one line per message
exporting multiple conversations bundles them into a zip, one jsonl file each
* webui: import jsonl and zip conversation exports
parse the new jsonl session format and zip archives on import
keep supporting the legacy json format
* UI : fix SSE transport detection and routing through CORS proxy. Assisted-by: Antigravity
* ui : replace magic strings with constants in MCP transport handling
* ui: add source toggle to mermaid and svg blocks
Add a toggle button next to copy and preview that switches a rendered
mermaid or svg block to its source code and back. The button is shared by
both block types and the rendered view stays the default.
The source view reuses the code block scroll container and the highlighted
code element captured at transform time, so it matches the app code blocks
without highlighting again.
Make tall diagrams scroll like text code blocks: safe centering keeps the
diagram centered when it fits and falls back to start alignment when it
overflows, so the top stays reachable instead of clipping above.
Keep the block header opaque and layered above the scrolled diagram, and
ignore header clicks in the zoom handler, so a button click never falls
through to the zoom dialog.
* ui: transparent diagram block header, address review from @allozaur
* spec: add spec metrics mean acceptance length and acceptance per pos
* fix as suggestion
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix as suggestion
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix as suggestion
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix as suggestions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ui: add svg block visualizer based on allozaur's mermaid PR
* ui: rationalise diagram block styling and pre transforms shared by mermaid and svg
* ui: live render streaming svg blocks
* ui: also render svg authored in xml code fences
* ui: refactor svg block rendering, address review from allozaur
- Move the svg size ceiling and DOMPurify config out of sanitize-svg.ts into /constants.
- Rename the svg-diagram class to svg-block so the name no longer implies diagrams only.
- Replace the svg, xml and svg tag magic strings in the markdown pipeline with shared constants.
- Promote the data-svg-rendered marker and its sibling data attributes to constants.
* ui: render svg blocks in a shadow root for animation and live zoom
Mount each sanitized svg inside an open shadow root so author <style> and
keyframe or smil animations run while staying scoped to the host element.
Relax the sanitizer to forbid only foreignObject and script, which lets
animation, href and external resource refs through for wider compatibility.
Render the inline block and the zoom dialog from the same reactive source,
so a streaming svg keeps drawing live inside the open zoom popup.
* Add boilerplate for file types
* Add heic-to and implement conversion
* Load heic library from CDN
* Use jpg instead of png for conversion
* Move const to constants file
* ui: make mobile layout keyboard-aware via interactive-widget and dvh shell anchor
* ui: fix duplicate PWA refresh popup by scoping the storage check to non-PWA pages
* server: clean up static assets handling
* nits
* simplify file name handling, use static file name everywhere
* cmake/ui : bundle UI assets in an archive
* ui : run prettier on post-build.js
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
When reasoning-budget is set in model.ini, the per-request
thinking_budget_tokens from the WebUI was ignored because the
model.ini value took unconditional precedence.
Swap the precedence so the WebUI per-request value is checked
first, with the model.ini value serving as a fallback default.
Assisted-by: pi:llama.cpp/Qwen3.6-27B
* ui: bake jpeg exif orientation into uploaded images
stb_image in mtmd ignores exif metadata, so rotated smartphone photos
reach the model with raw pixel orientation. The webui now reads the
exif orientation tag at send time and feeds it into the existing
capImageDataURLSize canvas pass: the browser applies the rotation when
decoding, so capped images come out upright for free, and images under
the cap threshold get a single plain redraw when orientation > 1.
At most one re-encode ever happens per image. Upright jpegs with
capping disabled pass through untouched, bit perfect.
Adds jpeg-orientation.ts with a minimal exif parser working on a
bounded base64 prefix (both endianness, returns 1 on any malformed
input) and unit tests against handcrafted jpeg byte streams.
* ui: move jpeg exif constants into lib/constants
* ui: add browser test for jpeg orientation and capping
Covers capImageDataURLSize end to end in chromium with real Pillow
generated jpeg fixtures across exif orientations 1/3/5/6/8: upright
quadrant colors checked pixel-wise, expected dimensions with and
without capping, no orientation tag left in the output, and strict
passthrough when nothing needs rewriting.
* webui: implement pinned conversations support
* webui: linter/prettier pass
* Fix the unused handleMobileSidebarItemClick from the component.
* the search should find pinned conversations as well
Co-authored-by: Pascal <admin@serveurperso.com>
---------
Co-authored-by: Pascal <admin@serveurperso.com>
* llama-graph : apply embedding scale when deepstack is not used
* nits: remove non-existant hunyuan-vl from the tests
* apply suggestion from @gabe-l-hart
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* ui: add opt-in run_javascript frontend tool
Expose a run_javascript tool to the model, executed entirely in the
browser through the existing agentic loop. Code runs in a Web Worker
inside a sandboxed iframe with an opaque origin, isolated from the
WebUI and its API. Console output, errors and the return value are
fed back as the tool result. The parent enforces a hard timeout by
removing the iframe, which terminates the worker.
Disabled by default, toggle in Settings > Developer.
* ui: address review feedback from allozaur
Use the JsonSchemaType enum for the tool definition parameter types
instead of raw string literals, extending it with STRING and NUMBER.
Move the worker shim and the iframe harness html into their own files
so the service no longer carries inline source blobs.
Replace the remaining magic strings with constants: SANDBOX_EMPTY_OUTPUT
and SANDBOX_TRUNCATION_NOTICE, and reuse NEWLINE_SEPARATOR for joins.
* ui: move sandbox worker shim to a raw imported file
Replace the inline worker template string with a real sandbox-worker.js
imported as raw text, and build the iframe harness from it in
sandbox-harness.ts. The raw worker ships as a string, not a module, so
it is excluded from eslint and the typecheck program.
* server: log prompts to directory
Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.
* Apply suggestion from @ngxson
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Always export idle slots to RAM
Without this, a slot's VRAM cache may not be written to RAM. If this
slot happens to be busy then later on, this triggers needless
preprocessing in another slot.
* cont : clean-up
---------
Co-authored-by: Christoph Weiss <weiss@wsoptics.de>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>