llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-29 15:50:22 +00:00

Author	SHA1	Message	Date
Xuan-Son Nguyen	cb47092b00	server: bump timeout to 3600s (#23842 ) * server: bump timeout to 3600s * nits: change wording	2026-05-29 10:23:17 +02:00
Adrien Gallouët	98e480a32e	app : move licences to llama-app (#23824 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-29 07:46:11 +02:00
Xuan-Son Nguyen	751ebd17a5	mtmd-debug: add color and rainbow mode (#23829 ) * mtmd-debug: add color and rainbow mode * fix M_PI * max_dist	2026-05-28 20:59:14 +02:00
Xuan-Son Nguyen	c8914ad4f4	mtmd: fix gemma 4 projector pre_norm (#23822 )	2026-05-28 20:58:55 +02:00
ValdikSS	2f6c815dc4	ui: fix audio and video modality detection (#23756 ) When model props are fetched asynchronously from the server, modelPropsVersion is incremented to trigger reactivity, but only the vision effect was listening to it.	2026-05-28 17:36:10 +02:00
Saba Fallah	0b56d283bf	mtmd: n_head_kv defaults to n_head (#23782 ) removed AI-generated comment	2026-05-28 16:44:36 +02:00
Xuan-Son Nguyen	d6be3158e1	mtmd: fix gemma 4 audio rms norm eps (#23815 ) * mtmd: fix gemma 4 audio rms norm eps * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-28 16:31:37 +02:00
Mikolaj Kucharski	7fb1e70b59	arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (#23167 )	2026-05-28 16:25:40 +02:00
Funtowicz Morgan	0b246862b9	server: minor tweaks to use more cpp features (#23785 ) * misc(server): add default port to impl RAII * misc(server): register_gcp_compat() can be const * misc(server): use proper cpp const/auto methods * misc(server): do not reset a unique_ptr, use make_unique instead to be exception safe	2026-05-28 14:00:25 +02:00
Markus Tavenrath	d205df6812	server, ui : Add support for HTTP ETags in llama-server (#23701 ) * allow caching of ui elements in llama-server * use fnv_hash * Update tools/server/server-http.cpp etag has to be set always Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-05-28 12:21:24 +02:00
Adrien Gallouët	48e7eae41c	perplexity : fix format specifier in LOG_ERR (#23788 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-28 10:34:58 +03:00
Georgi Gerganov	6b4e4bd582	common : fix env names to all have LLAMA_ARG_ prefix (#23778 )	2026-05-27 14:52:47 +03:00
Radoslav Gerganov	7085492c6f	server : fix the log message when using SSL (#23393 ) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this.	2026-05-27 08:06:30 +03:00
Pascal	5a4126adc1	ui: fix stop/continue during an agentic loop (#23356 )	2026-05-25 14:18:59 +02:00
Aman Gupta	6c4cbdc70b	server: MTP layer kv-cache should respect draft type ctk (#23646 )	2026-05-25 16:46:23 +08:00
Saba Fallah	b96487645c	ui: media attachments before text (#23467 ) * ui: media attachments before text * fix prettier formatting	2026-05-25 08:50:41 +02:00
jacekpoplawski	e2ef8fe42c	server: fix checkpoints creation (#22929 ) * common : add common_chat_split_by_role * cont : fix spans to reach end of message * server: fix checkpoints creation - extract message_spans from chat templates - find the prompt token position before the latest user message - split prompt batching at that position - create a context checkpoint before the latest user input - avoid periodic mid-prompt checkpoints when that position is known - handle multimodal prompts when mapping text/template positions to server prompt tokens - add --checkpoint-min-step to control minimum spacing between checkpoints * cont : clean-up * Support autoparser detection for message barriers * server: fix message span delimiter and update docs --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>	2026-05-25 08:56:18 +03:00
fairydreaming	6d57c26ef8	perplexity : fix even more integer overflows (#23623 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2026-05-25 08:12:39 +03:00
Aldehir Rojas	63248fc3e3	cmake : fix ui build (#23592 ) * cmake/ui : add -fPIC to llama-ui static lib * cmake : rename host compiled embed helper	2026-05-24 02:37:28 -05:00
Aman Gupta	83eebe9d08	server: add margin for draft model for `fit` (#23485 )	2026-05-24 14:43:08 +08:00
Aldehir Rojas	b22ff4b7b4	cmake/ui : refactor the build (#23352 )	2026-05-23 17:08:22 -04:00
Aditya Singh	c0c7e147e7	requirements : bump torch to 2.11.0 (#23503 ) * requirements: relax torch~=2.6.0 to torch>=2.6.0 for convert_hf_to_gguf The ~=2.6.0 operator resolves to >=2.6.0, <2.7.0, which fails on PyPI for platform/CPython combinations where 2.6.x is not present. The accompanying comment already says 'PyTorch 2.6.0 or later', so the looser >=2.6.0 matches the documented intent and unblocks pip install -r requirements/requirements-convert_hf_to_gguf.txt. Fixes #23408 * requirements: bump torch floor to 2.11.0 per maintainer * requirements: pin torch to ==2.11.0 per project policy * requirements: pin mtmd torch and torchvision to 2.11.0/0.26.0 per project policy * requirements: suppress check_requirements pin warning on mtmd The check_requirements script flags '==' on lines in files matched by //requirements.txt. Append the documented suppression comment to the pinned torch and torchvision lines (and to the s390x platform marker lines) so the check passes while keeping the pins required by project policy. * ty: silence Tensor/Module union check on model[0].auto_model With torch 2.11.0 stubs, nn.Sequential.__getitem__ now returns Tensor \| Module rather than Module, so model[0].auto_model fails ty on the SentenceTransformer code path. The runtime behavior is unchanged because SentenceTransformer always wraps a Module at index 0. Adding a targeted unresolved-attribute ignore keeps the type-check green without altering behavior. A follow-up issue tracks typing the variable explicitly.	2026-05-23 18:24:39 +02:00
Aldehir Rojas	1acee6bf89	server: only parse empty msg if continuing an assistant msg (#23506 )	2026-05-22 11:58:15 -04:00
fairydreaming	ef570f6308	perplexity : fix integer overflow (#23496 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2026-05-22 15:50:44 +03:00
Georgi Gerganov	bbce619adb	cmake : add install() for impl libraries + fix apple builds (#23511 ) * pi : update * ci : fix ios build * ci : fix andoroid * ci : fix apple builds * cmake : add install() for impl libraries Add install(TARGETS <target> LIBRARY) for all -impl libraries that were changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in commit `bb28c1fe2`. Without this, cmake --install fails to copy the shared libraries, causing runtime errors like: llama-server: error while loading shared libraries: libllama-server-impl.so Ref: https://github.com/ggml-org/llama.cpp/issues/23494#issuecomment-4512912515 Assisted-by: llama.cpp:local pi * ci : fix xcframework build	2026-05-22 11:46:26 +03:00
Georgi Gerganov	bb28c1fe24	cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462 ) * cmake : remove STATIC from impl libraries, allow BUILD_SHARED_LIBS control Remove explicit STATIC from all -impl libraries (server, cli, completion, bench, batched-bench, fit-params, quantize, perplexity) so BUILD_SHARED_LIBS controls shared vs static linkage. Add WINDOWS_EXPORT_ALL_SYMBOLS ON for proper DLL export on Windows. Assisted-by: llama.cpp:local pi * cmake : enable LLAMA_BUILD_APP by default Assisted-by: llama.cpp:local pi * ci : disable app in build-cmake-pkg.yml	2026-05-21 21:13:59 +03:00
ScrewTSW	b65bb4baae	server: expose prompt token counts in /slots endpoint (#23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor prompt evaluation progress during processing.	2026-05-21 13:29:13 +02:00
Aman Gupta	52fb93a2bd	server : free draft/MTP resources on sleep to fix VRAM leak (#23461 ) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model (model_dft). For MTP (Multi-Token Prediction) models, ctx_dft holds GPU-allocated resources (KV cache, compute buffers) that are not freed when entering the sleeping state. On each sleep/resume cycle, new resources are allocated without the old ones being freed, leading to a VRAM leak that eventually crashes the server with out-of-memory errors. Fix by explicitly resetting spec, ctx_dft, and model_dft in destroy() before resetting llama_init, ensuring proper cleanup order to avoid use-after-free. ref: https://github.com/ggml-org/llama.cpp/issues/23395 Assisted-by: llama.cpp:local pi	2026-05-21 16:11:11 +08:00
Pascal	c9021714e8	server: re-inject subcommand when router spawns children under unified binary (#23442 )	2026-05-21 10:09:19 +02:00
Adrien Gallouët	1d7ab2b947	app : add batched-bench, fit-params, quantize & perplexity (#23459 ) * app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing main.cpp Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add EOL Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-21 10:29:44 +03:00
Aleksander Grygier	5e932a1c8d	ui: Improve Git Hooks for UI development (#23403 ) * refactor: Improve Git Hooks for UI development * fix: Address review comments * fix: Use absolute git path for `/hooks` Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-05-21 08:27:50 +02:00
wendadawen	6a257d4463	mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 ) - HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. - Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch	2026-05-21 00:35:37 +02:00
stduhpf	3a479c9132	ui: Add max image size option (#22849 ) * webui: Add max image size option * remove magic numbers * support all image formats * use const * Move regex to match b64 images to constants * use SETTINGS_KEYS to get max image resolution setting * Do not touch the image if already under the size threshold	2026-05-21 00:00:09 +02:00
Saba Fallah	a8681a0ed2	mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345 ) * mtmd : deepseek-ocr fixes, improvements and refactoring - image processing changes to achieve full parity with Pillow (reference impl) - SAM mask casting only when flash-attn is on - SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it) - llama-chat changes to fix server/WebUI issue (new media_markers_first()) - adapted test-chat-template and added test cases for deepseek-ocr - changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model - ty.toml ignore unresolved-import for tools/mtmd/tests/** * image-text reordering fix removed * refactor bool add_padding + pad_rounding enum into a single pad_style enum	2026-05-20 17:37:10 +02:00
Aleksander Grygier	6ce96713de	feat: Add WAV MIME type variants and improve audio format detection (#23396 )	2026-05-20 16:55:24 +02:00
Adrien Gallouët	29f1482221	app : introduce the llama unified executable (#23296 ) * app : introduce the llama unified executable Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use serve for server Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Hide completion and bench, add help command Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Remove STATIC Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use -impl targets instead of -lib Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Revert "Remove STATIC" This reverts commit `cc44caccb9`. --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-20 13:22:22 +02:00
Aleksander Grygier	e6b4acfe86	refactor: Move text attachments up before the message content in chat completions payload (#23406 )	2026-05-20 13:04:01 +02:00
Xuan-Son Nguyen	e2b129e1bf	mtmd: fit_params now take into account mmproj (#21489 ) * mtmd: fit_params now take into account mmproj * rename alloc_compute_meta to reserve_compute_meta * rm unused functions * add ggml_backend_dev_t support * add debug log	2026-05-20 11:27:44 +02:00
Aleksander Grygier	5028447384	ui: Refactor `isMobile` as reactive value in `viewport` store (#23330 ) * refactor: `isMobile` as reactive value in `viewport` store * refactor: Use Svelte media query for the viewport store	2026-05-20 10:52:00 +02:00
Aleksander Grygier	585080d310	fix: Div wrapper no pointer events on hidden (#23390 )	2026-05-20 09:46:31 +02:00
Aleksander Grygier	67ace021da	refactor: Chat Screen UI rendering (#23333 )	2026-05-19 22:38:42 +02:00
Johannes Gäßler	7256fce047	common: fix --fit verbosity with --verbosity 4 (#23282 )	2026-05-19 21:33:23 +02:00
Georgi Gerganov	d14ce3dab4	llama : MTP clean-up (#23269 ) * llama : disable equal splits for recurrent memory with partial rollback * spec : re-enable p-min with MTP drafts * spec : re-enable ngram spec in combination with RS rollback * spec : fix ngram-map-* params * spec : fix acceptance logic in combined ngram + draft configs * graph : fix reuse for combined `token` + `embd` batches * spec : log parameters for each speculative implementation - add LOG_INF in each constructor with implementation type and parameters - extract device string logic into common_speculative_get_devices_str() - move 'adding speculative implementation' log from init into constructors Assisted-by: llama.cpp:local pi * spec : extend --spec-default with ngram-map-k4v Assisted-by: llama.cpp:local pi * minor : fix n_embd log * args : update draft.n_max == 3 + regen docs * spec : relax ngram-mod rejection thold to 0.25 @ 5 low * logs : improve * docs : update speculative decoding CLI argument documentation - Add missing draft model CPU scheduling and tensor override parameters - Update --spec-type to include all available types (excluding draft-eagle3 WIP) - Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0) - Remove deprecated options (spec-draft-ctx-size, spec-draft-replace) - Add environment variables for new parameters Assisted-by: llama.cpp:local pi * arg : step-back on adding k4v to the default spec config * cont : fix name	2026-05-19 15:32:58 +03:00
Aleksander Grygier	6db130445d	ui: Bump packages + address build warnings (#23300 ) * chore: Update vulnerable packages * chore: Formatting * refactor: Update Tailwind CSS imports * ci: Use `ubuntu-latest` for Unit/E2E UI tests * chore: Bump package * fix: Add missing tag * refactor: Enums files naming	2026-05-19 10:16:04 +02:00
Pascal	ccee426426	server-context: guarantee there is at least 1 token to decode (#23280 )	2026-05-19 09:49:01 +03:00
Georgi Gerganov	3c81c8deea	server : print graphs reused in slot timings (#23279 ) Add graphs reused counter to the per-slot timing output, printed via llama_perf_context(). Assisted-by: llama.cpp:local pi Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>	2026-05-19 09:46:58 +03:00
Aleksander Grygier	3a9c1b854d	ui: Update KaTeX package and clean up logs from `sass` warnings (#23275 ) * ui: migrate katex imports to @use to resolve SCSS deprecation warnings * ci: Use `ubuntu-slim` for CI (UI) workflow	2026-05-18 16:26:01 +02:00
Aleksander Grygier	b9a2170fce	feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270 )	2026-05-18 16:17:21 +02:00
Aleksander Grygier	1ff0fc1384	ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236 ) * refactor: Scope console logs to `DEV` + `VITE_DEBUG` env vars * refactor: skip MCP proxy probe when no server requires it * refactor: suppress expected disconnect errors during MCP client shutdown * refactor: Deduplicate requests * refactor: deduplicate model fetching across ROUTER and MODEL modes * refactor: Clean up models logic * chore: Add `.env.example` file * refactor: replace client-side CORS proxy probe with server status flag * refactor: Post-review fixes * test: add vitest client setup with API fetch mocks	2026-05-18 16:09:40 +02:00
Aleksander Grygier	a135ec0baa	ui: Centralize monospace font styles in app.css (#23272 )	2026-05-18 15:10:14 +02:00

1 2 3 4 5 ...

868 Commits