llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 15:20:20 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	80452d65b9	server : consolidate slot selection into get_available_slot (#24755 ) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt cache updates. Assisted-by: pi:llama.cpp/Qwen3.6-27B	2026-06-19 09:22:34 +03:00
Xuan-Son Nguyen	db52540f73	mtmd: add batching support for internvl (#24775 )	2026-06-19 01:16:16 +02:00
Reguna	40f3aafc45	server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774 ) * server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will break streaming with certain applications (notably the Pi coding harness).	2026-06-18 22:01:24 +02:00
Xuan-Son Nguyen	a6b3260a42	mtmd: add batching for mtmd-cli, add video tests (#24778 )	2026-06-18 21:55:04 +02:00
Xuan-Son Nguyen	060ce1bf72	mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769 ) * add dedicated "overview" for mtmd_image_preproc_out * corrections * correct (again) * nits * nits (2)	2026-06-18 18:53:49 +02:00
Kangjia Gao	7b6c5a2aed	docs: fix export-lora --lora-scaled syntax [no release] (#24703 ) Assisted-by: Codex	2026-06-18 16:46:17 +02:00
Xuan-Son Nguyen	fe7c8b2414	server: (router) fix stopping_thread potentially hang (#24728 ) * server: (router) fix stopping_thread potentially hang * fix windows build	2026-06-18 15:41:09 +02:00
Xuan-Son Nguyen	e1efd0991d	server: add "schema" and validation (#24150 ) * wip * working * correct some limits * add field name to error message	2026-06-18 15:40:58 +02:00
Aarni Koskela	08023072ef	server : add last-5-seconds generation speed display (#24291 ) * server : add last-5-seconds generation speed display * cont : clean-up --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-18 14:02:20 +02:00
Amos Wong	20832179e2	ui: provide touch accessible model selection UI (#24604 ) * ui : add model selector storybook stories Covers list, favorites, single-model, all status states (loading/loaded/sleeping/failed/idle), and selection states. * ui : improve model selector mobile UX with hover media queries Use @media (hover:none) to show action buttons directly on touch devices and color-code them by model status (amber=sleeping, green=loaded, muted=idle). Status dots hidden on touch. Desktop hover behavior unchanged.	2026-06-18 13:14:20 +02:00
Anuj Attri	10786217e9	server : return HTTP 400 on invalid grammar (#24144 ) (#24154 ) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-grammar response. Fixes #24144	2026-06-18 12:49:14 +02:00
Xuan-Son Nguyen	552258c535	server: (router) rework -hf preset repo (#24739 ) * server: temporary remove HF remote preset * rework remove preset.ini support * rm unused get_remote_preset_whitelist() * print warning * add docs * rm stray file	2026-06-18 12:45:23 +02:00
Xuan-Son Nguyen	968c43891a	server: fix router args not being forwarded to child instances (#24760 )	2026-06-18 12:15:46 +02:00
Xuan-Son Nguyen	24bba7b98e	mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736 ) * add mtmd_image_preproc_out * add dev docs * remove unused clip API * rm unused clip_image_f32_batch::grid * change preprocess() call signature	2026-06-18 12:04:39 +02:00
Aleksander Grygier	0b73fc79fe	ui: Update code formatting command in pre-commit hook (#24685 )	2026-06-18 08:33:50 +02:00
Xuan-Son Nguyen	f3e1828164	mtmd: llava_uhd should no longer use batch dim (#24732 )	2026-06-17 22:40:50 +02:00
Xuan-Son Nguyen	4b4d13ae72	server: (router) add model management API (#23976 ) * wip * server: (router) add SSE realtime updates API * nits * wip * add download API * add download api * update docs * add delete endpoint * fix std::terminate * fix crash * fix 2 * add tests * nits	2026-06-17 18:04:58 +02:00
Julien Chaumond	8086439a4c	webui: export conversations as jsonl (#24688 ) * webui: export conversations as jsonl each session is one jsonl file, a session header line followed by one line per message exporting multiple conversations bundles them into a zip, one jsonl file each * webui: import jsonl and zip conversation exports parse the new jsonl session format and zip archives on import keep supporting the legacy json format	2026-06-17 13:25:47 +02:00
Harapan Rachman	bae36efa30	UI : fix SSE transport detection and routing through CORS proxy. Assi… (#24500 ) * UI : fix SSE transport detection and routing through CORS proxy. Assisted-by: Antigravity * ui : replace magic strings with constants in MCP transport handling	2026-06-17 08:26:30 +02:00
Pascal	c1304d7b28	ui: add source toggle to mermaid and svg blocks (#24652 ) * ui: add source toggle to mermaid and svg blocks Add a toggle button next to copy and preview that switches a rendered mermaid or svg block to its source code and back. The button is shared by both block types and the rendered view stays the default. The source view reuses the code block scroll container and the highlighted code element captured at transform time, so it matches the app code blocks without highlighting again. Make tall diagrams scroll like text code blocks: safe centering keeps the diagram centered when it fits and falls back to start alignment when it overflows, so the top stays reachable instead of clipping above. Keep the block header opaque and layered above the scrolled diagram, and ignore header clicks in the zoom handler, so a button click never falls through to the zoom dialog. * ui: transparent diagram block header, address review from @allozaur	2026-06-16 14:14:22 +02:00
Ruixiang Wang	635b65ad7a	spec: add spec metrics mean acceptance length and acceptance rate per position (#24536 ) * spec: add spec metrics mean acceptance length and acceptance per pos * fix as suggestion Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix as suggestion Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix as suggestion Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix as suggestions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-16 10:23:09 +03:00
Adrien Gallouët	e3a74b2990	bench : add --offline (#24511 ) * bench : add --offline Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add default Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-06-16 08:26:05 +02:00
Xuan-Son Nguyen	e36a602ba3	mtmd: fix miscounting n_tokens (#24656 )	2026-06-15 18:07:14 +02:00
Georgi Gerganov	e3cab403bf	mtmd : add post-decode callback (#24645 ) Assisted-by: pi:llama.cpp/Qwen3.6-27B	2026-06-15 16:02:05 +03:00
Pascal	2a6c391a5e	UI/svg block rendering (#24080 ) * ui: add svg block visualizer based on allozaur's mermaid PR * ui: rationalise diagram block styling and pre transforms shared by mermaid and svg * ui: live render streaming svg blocks * ui: also render svg authored in xml code fences * ui: refactor svg block rendering, address review from allozaur - Move the svg size ceiling and DOMPurify config out of sanitize-svg.ts into /constants. - Rename the svg-diagram class to svg-block so the name no longer implies diagrams only. - Replace the svg, xml and svg tag magic strings in the markdown pipeline with shared constants. - Promote the data-svg-rendered marker and its sibling data attributes to constants. * ui: render svg blocks in a shadow root for animation and live zoom Mount each sanitized svg inside an open shadow root so author <style> and keyframe or smil animations run while staying scoped to the host element. Relax the sanitizer to forbid only foreignObject and script, which lets animation, href and external resource refs through for wider compatibility. Render the inline block and the zoom dialog from the same reactive source, so a streaming svg keeps drawing live inside the open zoom popup.	2026-06-15 08:11:36 +02:00
franitel	ef8268feee	fix(ui): render thinking/reasoning block content as markdown (#24611 ) * fix(ui): render thinking/reasoning block content as markdown * feat(ui): add toggle setting for thinking block markdown rendering	2026-06-14 22:56:56 +02:00
Nicolas Mowen	5f04dc7ac3	ui: Add HEIC/HEIF image support (#24137 ) * Add boilerplate for file types * Add heic-to and implement conversion * Load heic library from CDN * Use jpg instead of png for conversion * Move const to constants file	2026-06-14 20:42:16 +02:00
Pascal	fd5869fb62	UI/mobile keyboard and pwa popup fixes (#24610 ) * ui: make mobile layout keyboard-aware via interactive-widget and dvh shell anchor * ui: fix duplicate PWA refresh popup by scoping the storage check to non-PWA pages	2026-06-14 18:35:00 +02:00
Amos Wong	1fd6dfe9f3	ui : fix ui clipping in mobile due to incorrect height setup (#24605 )	2026-06-14 16:15:51 +02:00
Michael Wand	6e14286eda	cli : fix not copying preserved tokens (#24258 )	2026-06-14 11:52:15 +02:00
Aldehir Rojas	53bd47ea5b	ui : fix llama-ui-embed crash when no asset dir is given (#24597 )	2026-06-13 17:53:30 -05:00
Xuan-Son Nguyen	e8067a8b36	ui: build-time gzip compression (#24571 ) * ui: keep original file name and path * fix nocache * ui: build-time gzip compression	2026-06-13 16:57:27 +02:00
Xuan-Son Nguyen	597b6672e8	ui: keep original file name and path (#24568 ) * ui: keep original file name and path * fix nocache	2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen	57fe1f07c3	server: clean up static assets handling (#24550 ) * server: clean up static assets handling * nits * simplify file name handling, use static file name everywhere * cmake/ui : bundle UI assets in an archive * ui : run prettier on post-build.js --------- Co-authored-by: Alde Rojas <hello@alde.dev>	2026-06-13 11:51:20 +02:00
Georgi Gerganov	d8a24ccee2	fit : wrap llama_device_memory_data (#24522 )	2026-06-13 08:09:52 +03:00
Xuan-Son Nguyen	e37abd6b5f	mtmd: add batching API (#24384 ) * mtmd: add batching API * wip * first working version (gemma4v) * add arg * nits * wire up support_batch() * fix 0.0 output embd * fix audio * nits * refactor a bit * nits * fix non-batching case * fix comment	2026-06-13 00:10:29 +02:00
Georgi Gerganov	ebc10770ac	server : fix reasoning budget WebUI precedence over model.ini (#24517 ) When reasoning-budget is set in model.ini, the per-request thinking_budget_tokens from the WebUI was ignored because the model.ini value took unconditional precedence. Swap the precedence so the WebUI per-request value is checked first, with the model.ini value serving as a fallback default. Assisted-by: pi:llama.cpp/Qwen3.6-27B	2026-06-12 17:59:56 +03:00
Aleksander Grygier	f7ca93d12c	ui: PWA support (#23871 ) * feat: Add basic PWA support and service worker for offline caching * feat: Vite PWA implementation WIP * feat: Improve PWA icons generation * feat: Add PWA workbox to server routes * feat: Include `version.json` in static assets * feat: Add HTTP cache headers for PWA static assets * feat: Update app name for `apple-mobile-web-app-title` * feat: Implement PWA versioning and automatic update detection * chore: Update `.gitignore` files * feat: Splash Screens * feat: Add dark mode favicon support * refactor: Cleanup * fix: Use dark logo for dark splash screens * refactor: Simplify favicons SVG code * fix: Adjust caching and polling for reliable service worker updates * fix: Add missing favicon entry * fix: Align PWA service worker configuration with SvelteKit build structure * fix: Replace hashed bundle paths with versioned static paths * test: Add PWA tests * ci: Add build output for unit tests * refactor: Cleanup * fix: Server build & release versioning * chore: Update package-lock.json * chore: Increase PWA cache size * chore: Update packages * feat: Update favicons * refactor: Post-merge fix * feat: support explicit build version for PWA cache busting * fix: CI * feat: Improve PWA Refresh Alert UI * feat: Add toggleable build version display * refactor: Cleanup * feat: Add version mismatch detection and manual app reload * refactor: replace dynamic imports with static * refactor: Cleanup * feat: Add safe space for `pwa-<size>.png` rendered icons * fix: use relative paths for PWA assets to support base path deployment * feat: add PWA mode detection via URL query parameter * feat: Use ?cache=true for SW-cached PWA assets * refactor: Build process cleanup * refactor: Decouple PWA versioning and remove ?cache=true workaround * chore: Update README logo * feat: Include PWA Assets generation in build script * refactor: `usePwa` hook for core layout * fix: Relativize base vite plugin * fix: remove unnecessary backslash escapes in test regexes * test: update static asset paths for API Key test * refactor: Move SvelteKit PWA Options config to constants * ui: fix update notification never appearing Keep the PWA hook object intact instead of destructuring needRefreshByStorage, which freezes the reactive getter. Also exclude loading.html from PWA precache to prevent 404 errors and broken SW installation.	2026-06-12 15:53:26 +02:00
Pascal	6471e3c090	UI/jpeg exif orientation (#24196 ) * ui: bake jpeg exif orientation into uploaded images stb_image in mtmd ignores exif metadata, so rotated smartphone photos reach the model with raw pixel orientation. The webui now reads the exif orientation tag at send time and feeds it into the existing capImageDataURLSize canvas pass: the browser applies the rotation when decoding, so capped images come out upright for free, and images under the cap threshold get a single plain redraw when orientation > 1. At most one re-encode ever happens per image. Upright jpegs with capping disabled pass through untouched, bit perfect. Adds jpeg-orientation.ts with a minimal exif parser working on a bounded base64 prefix (both endianness, returns 1 on any malformed input) and unit tests against handcrafted jpeg byte streams. * ui: move jpeg exif constants into lib/constants * ui: add browser test for jpeg orientation and capping Covers capImageDataURLSize end to end in chromium with real Pillow generated jpeg fixtures across exif orientations 1/3/5/6/8: upright quadrant colors checked pixel-wise, expected dimensions with and without capping, no orientation tag left in the output, and strict passthrough when nothing needs rewriting.	2026-06-12 10:20:27 +02:00
Xuan-Son Nguyen	18ef86ecec	server: skip unused log lines on router mode (#24463 )	2026-06-11 11:36:35 +02:00
Aldehir Rojas	db94854ff5	server : skip checkpoints beyond pos_next (#24411 ) * server : skip checkpoints beyond pos_next * cont : update comment + TODO + ref --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-11 10:18:12 +03:00
Rémy Mathieu	76da2450a4	webui: implement pinned conversations support (#21387 ) * webui: implement pinned conversations support * webui: linter/prettier pass * Fix the unused handleMobileSidebarItemClick from the component. * the search should find pinned conversations as well Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-06-09 21:33:22 +02:00
Aarnav Pai	d73cd07674	graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357 ) * llama-graph : apply embedding scale when deepstack is not used * nits: remove non-existant hunyuan-vl from the tests * apply suggestion from @gabe-l-hart --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-06-09 19:46:27 +02:00
Pascal	483609509d	ui: add opt-in run_javascript frontend tool (#24244 ) * ui: add opt-in run_javascript frontend tool Expose a run_javascript tool to the model, executed entirely in the browser through the existing agentic loop. Code runs in a Web Worker inside a sandboxed iframe with an opaque origin, isolated from the WebUI and its API. Console output, errors and the return value are fed back as the tool result. The parent enforces a hard timeout by removing the iframe, which terminates the worker. Disabled by default, toggle in Settings > Developer. * ui: address review feedback from allozaur Use the JsonSchemaType enum for the tool definition parameter types instead of raw string literals, extending it with STRING and NUMBER. Move the worker shim and the iframe harness html into their own files so the service no longer carries inline source blobs. Replace the remaining magic strings with constants: SANDBOX_EMPTY_OUTPUT and SANDBOX_TRUNCATION_NOTICE, and reuse NEWLINE_SEPARATOR for joins. * ui: move sandbox worker shim to a raw imported file Replace the inline worker template string with a real sandbox-worker.js imported as raw text, and build the iframe harness from it in sandbox-harness.ts. The raw worker ships as a string, not a module, so it is excluded from eslint and the typecheck program.	2026-06-09 18:02:31 +02:00
Saba Fallah	49f3542190	mtmd: build_vit batching (#24352 )	2026-06-09 16:32:08 +02:00
Nick Towle	ae735b1314	ui: Fix excessive style recalculation on hover (#24243 )	2026-06-09 12:52:20 +02:00
Xuan-Son Nguyen	9682e351b8	mtmd: refactor video subproc handling (#24316 ) * mtmd: refactor video subproc handling * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Mikko Juola <mikjuo@gmail.com> --------- Co-authored-by: Mikko Juola <mikjuo@gmail.com>	2026-06-09 13:15:12 +03:00
jacekpoplawski	1e912561dd	server: log prompts to directory (#22031 ) * server: log prompts to directory Add `--log-prompts-dir` to write each prompt to a separate text file in the specified directory. * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-06-09 12:09:07 +02:00
Pascal	efbacf8d21	ui: fix mobile chat form overflow and bust stale bundle cache (#24158 )	2026-06-09 11:12:58 +02:00
fiesh	961e9a3e46	server : do not clear slots without unified KV cache (#24190 ) * Always export idle slots to RAM Without this, a slot's VRAM cache may not be written to RAM. If this slot happens to be busy then later on, this triggers needless preprocessing in another slot. * cont : clean-up --------- Co-authored-by: Christoph Weiss <weiss@wsoptics.de> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-09 10:45:16 +03:00

1 2 3 4 5 ...

968 Commits