llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 15:20:20 +00:00

Author	SHA1	Message	Date
Alexey Kopytko	c035ff4902	[SYCL]: Remove per-allocation Level Zero runtime checks (#23399 ) * [SYCL] Centralize Level Zero detection in ggml_sycl_init * use the same wording * get back the warning * [SYCL] Remove per-allocation getenv() for GGML_SYCL_ENABLE_LEVEL_ZERO * bring back the comment * move it up to make sure devices call the shots * move the env detection early * replace g_ggml_sycl_enable_level_zero with a direct call to .ext_oneapi_level_zero * update the comment * switch back to g_ggml_sycl_enable_level_zero with a sentinel * remove the check * Reduce the diff * reword, move lower * move things aroudn * remove forward declaration if favor of a full replace * pre-cache results of zeDeviceGetProperties * put ggml_sycl_get_env back * replace get_sycl_env with ggml_sycl_get_env * add whitespace back * Apply suggestion from @sanmai b9646	2026-06-15 09:58:42 +03:00
Georgi Gerganov	272088b9f2	metal : add repeat bf16 (#24638 ) b9645	2026-06-15 09:57:16 +03:00
Piotr Wilkin (ilintar)	a6dff71270	chat: fix whitespace problems once and for all (#24624 ) * chat: fix whitespace problems once and for all * Purge trailing spaces from grammar generation * Revert "Purge trailing spaces from grammar generation" This reverts commit `b0827ecb7d`. b9644	2026-06-15 08:27:10 +02:00
Pascal	2a6c391a5e	UI/svg block rendering (#24080 ) * ui: add svg block visualizer based on allozaur's mermaid PR * ui: rationalise diagram block styling and pre transforms shared by mermaid and svg * ui: live render streaming svg blocks * ui: also render svg authored in xml code fences * ui: refactor svg block rendering, address review from allozaur - Move the svg size ceiling and DOMPurify config out of sanitize-svg.ts into /constants. - Rename the svg-diagram class to svg-block so the name no longer implies diagrams only. - Replace the svg, xml and svg tag magic strings in the markdown pipeline with shared constants. - Promote the data-svg-rendered marker and its sibling data attributes to constants. * ui: render svg blocks in a shadow root for animation and live zoom Mount each sanitized svg inside an open shadow root so author <style> and keyframe or smil animations run while staying scoped to the host element. Relax the sanitizer to forbid only foreignObject and script, which lets animation, href and external resource refs through for wider compatibility. Render the inline block and the zoom dialog from the same reactive source, so a streaming svg keeps drawing live inside the open zoom popup.	2026-06-15 08:11:36 +02:00
leonardHONG	3686e9d643	CUDA: only support F32/F16 for GGML_OP_REPEAT (#24533 ) b9642	2026-06-15 09:11:00 +03:00
Masashi Yoshimura	6e9007ae61	ggml-webgpu: improve i-quants mul_mat performance and speed up prefill (#24530 ) * Improve prefill speeds for i-quants * Fix #if defined() usage in preprocessor guards. b9641	2026-06-14 18:15:30 -07:00
Sigbjørn Skjæret	dd4623a74f	convert : fix lora base model arch retrieval (#24621 )	2026-06-15 00:55:26 +02:00
franitel	ef8268feee	fix(ui): render thinking/reasoning block content as markdown (#24611 ) * fix(ui): render thinking/reasoning block content as markdown * feat(ui): add toggle setting for thinking block markdown rendering	2026-06-14 22:56:56 +02:00
Nicolas Mowen	5f04dc7ac3	ui: Add HEIC/HEIF image support (#24137 ) * Add boilerplate for file types * Add heic-to and implement conversion * Load heic library from CDN * Use jpg instead of png for conversion * Move const to constants file	2026-06-14 20:42:16 +02:00
Piotr Wilkin (ilintar)	aedb2a5e9c	chat: add dedicated Cohere2MoE (North Code) parser (#24615 ) * chat: add dedicated Cohere2MoE (North Code) parser * Some renames to make @CISC happy :> b9637	2026-06-14 20:17:40 +02:00
Mohammad Athar	8edaca9034	docs : fix typos in CUDA-FEDORA.md and grammars/README.md (#24459 )	2026-06-15 01:33:38 +08:00
Alexander Batischev	20c5266f8a	docker: specify registry to simplify Podman builds (#24607 )	2026-06-15 01:27:20 +08:00
Pascal	fd5869fb62	UI/mobile keyboard and pwa popup fixes (#24610 ) * ui: make mobile layout keyboard-aware via interactive-widget and dvh shell anchor * ui: fix duplicate PWA refresh popup by scoping the storage check to non-PWA pages	2026-06-14 18:35:00 +02:00
Amos Wong	1fd6dfe9f3	ui : fix ui clipping in mobile due to incorrect height setup (#24605 )	2026-06-14 16:15:51 +02:00
Sigbjørn Skjæret	acd79d603c	jinja : add count/d/e filter aliases (#24606 ) b9632	2026-06-14 15:07:31 +02:00
Michael Wand	6e14286eda	cli : fix not copying preserved tokens (#24258 ) b9631	2026-06-14 11:52:15 +02:00
Bartowski	8ed274ef46	Add cohere2moe to llama-vocab for TINY_AYA (#24601 ) b9630	2026-06-14 09:04:46 +02:00
Sigbjørn Skjæret	46722116b9	ci : use CUDA label for cuda backend (#24594 )	2026-06-14 08:27:52 +02:00
Sigbjørn Skjæret	c2ba3e47a2	add sycl to check-release (#24583 ) b9628	2026-06-14 09:42:26 +08:00
Aldehir Rojas	53bd47ea5b	ui : fix llama-ui-embed crash when no asset dir is given (#24597 ) b9627	2026-06-13 17:53:30 -05:00
Michael Wand	4988f6e866	Add arch support for cohere2-MoE (#24260 ) * Add arch support for cohere2-MoE * Removed redundant gating_func checks * Changed ffn lookup to prefer prefix_dense_intermediate_size * Renamed arch to cohere2moe * Removed redundant lmhead check and chat template changes * Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight * Changed to (routed+shared)0.5 for shared expert combined avg fixed sliding_window_pattern issue and pattern * Fixed transformers crash 'first_k_dense_replace' error * Remove comment * Removed cohere2-moe as a tokenizer type and kept as tiny_aya. Renamed North-Mini-Code-1.0. * Fixed MTP fail, changed to use iSWA * Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use * Force metadata usage Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove Cohere2 checkpoint comment Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove MTP comment Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Regenerate cohere2moe tokenizer hash * Add cohere2moe to Llama Model Saver supported list * Check for zerobios tensors and add support for Command to use LayerNorm * Map expert_selection_fn to sigmoid in base.py instead of command.py * use bools for foundnorm/foundnormrms Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b9626	2026-06-13 19:49:00 +02:00
Sigbjørn Skjæret	f05cf4676a	jinja : fix negative step slice with start/stop values (#24580 ) b9625	2026-06-13 18:28:40 +02:00
Xuan-Son Nguyen	e8067a8b36	ui: build-time gzip compression (#24571 ) * ui: keep original file name and path * fix nocache * ui: build-time gzip compression b9624	2026-06-13 16:57:27 +02:00
Sigbjørn Skjæret	341babcf73	jinja : fix split and replace with empty first arg (#24574 ) * fix split and replace with empty first arg * fix reserve size b9623	2026-06-13 16:56:59 +02:00
Jeff Bolz	1a7718b4c5	vulkan: support non-contig unary/glu ops (#24215 ) * vulkan: support non-contig unary/glu ops Change unary/glu ops to pass in all strides and use fastdiv for the index calculation. Put all unary ops in one file, similar to glu, to share the code. codex went ahead and added expm1 without me asking, but I had to make it do a real precision analysis rather than just making stuff up. unary.comp initially couldn't use generic_unary_head because there wasn't space for xielu's additional constants. Fixing this required packing the fastdiv 'L' values. * attempt to workaround compiler bug * resolve conflict from #23991 * use expm1 b9622	2026-06-13 08:44:15 -05:00
Xuan-Son Nguyen	597b6672e8	ui: keep original file name and path (#24568 ) * ui: keep original file name and path * fix nocache b9621	2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen	57fe1f07c3	server: clean up static assets handling (#24550 ) * server: clean up static assets handling * nits * simplify file name handling, use static file name everywhere * cmake/ui : bundle UI assets in an archive * ui : run prettier on post-build.js --------- Co-authored-by: Alde Rojas <hello@alde.dev> b9620	2026-06-13 11:51:20 +02:00
Georgi Gerganov	d8a24ccee2	fit : wrap llama_device_memory_data (#24522 ) b9619	2026-06-13 08:09:52 +03:00
Muhammad Salem	c34b92235b	fix sycl links in release notes (#24527 ) * fix sycl links in release notes * remove extra line	2026-06-13 08:37:55 +08:00
Xuan-Son Nguyen	e37abd6b5f	mtmd: add batching API (#24384 ) * mtmd: add batching API * wip * first working version (gemma4v) * add arg * nits * wire up support_batch() * fix 0.0 output embd * fix audio * nits * refactor a bit * nits * fix non-batching case * fix comment	2026-06-13 00:10:29 +02:00
Sigbjørn Skjæret	f58bad4137	ci : unbreak release harder (#24545 ) * unbreak release harder * missed one * remove missing test for now b9616	2026-06-12 23:49:36 +02:00
Sigbjørn Skjæret	cd5044661c	ci : unbreak release (#24544 )	2026-06-12 23:29:49 +03:00
Georgi Gerganov	ebc10770ac	server : fix reasoning budget WebUI precedence over model.ini (#24517 ) When reasoning-budget is set in model.ini, the per-request thinking_budget_tokens from the WebUI was ignored because the model.ini value took unconditional precedence. Swap the precedence so the WebUI per-request value is checked first, with the model.ini value serving as a fallback default. Assisted-by: pi:llama.cpp/Qwen3.6-27B	2026-06-12 17:59:56 +03:00
Ruben Ortlam	3e7bd4f39a	vulkan: add pipeline barriers for memcpy read operations (#23770 ) * vulkan: add pipeline barriers for memcpy read/write operations * remove unnecessary host write pipeline barriers	2026-06-12 16:43:50 +02:00
Aleksander Grygier	f7ca93d12c	ui: PWA support (#23871 ) * feat: Add basic PWA support and service worker for offline caching * feat: Vite PWA implementation WIP * feat: Improve PWA icons generation * feat: Add PWA workbox to server routes * feat: Include `version.json` in static assets * feat: Add HTTP cache headers for PWA static assets * feat: Update app name for `apple-mobile-web-app-title` * feat: Implement PWA versioning and automatic update detection * chore: Update `.gitignore` files * feat: Splash Screens * feat: Add dark mode favicon support * refactor: Cleanup * fix: Use dark logo for dark splash screens * refactor: Simplify favicons SVG code * fix: Adjust caching and polling for reliable service worker updates * fix: Add missing favicon entry * fix: Align PWA service worker configuration with SvelteKit build structure * fix: Replace hashed bundle paths with versioned static paths * test: Add PWA tests * ci: Add build output for unit tests * refactor: Cleanup * fix: Server build & release versioning * chore: Update package-lock.json * chore: Increase PWA cache size * chore: Update packages * feat: Update favicons * refactor: Post-merge fix * feat: support explicit build version for PWA cache busting * fix: CI * feat: Improve PWA Refresh Alert UI * feat: Add toggleable build version display * refactor: Cleanup * feat: Add version mismatch detection and manual app reload * refactor: replace dynamic imports with static * refactor: Cleanup * feat: Add safe space for `pwa-<size>.png` rendered icons * fix: use relative paths for PWA assets to support base path deployment * feat: add PWA mode detection via URL query parameter * feat: Use ?cache=true for SW-cached PWA assets * refactor: Build process cleanup * refactor: Decouple PWA versioning and remove ?cache=true workaround * chore: Update README logo * feat: Include PWA Assets generation in build script * refactor: `usePwa` hook for core layout * fix: Relativize base vite plugin * fix: remove unnecessary backslash escapes in test regexes * test: update static asset paths for API Key test * refactor: Move SvelteKit PWA Options config to constants * ui: fix update notification never appearing Keep the PWA hook object intact instead of destructuring needRefreshByStorage, which freezes the reactive getter. Also exclude loading.html from PWA precache to prevent 404 errors and broken SW installation.	2026-06-12 15:53:26 +02:00
Georgi Gerganov	02182fc5b9	fit : avoid including llama-ext.h in fit.h (#24506 ) b9611	2026-06-12 15:57:05 +03:00
Georgi Gerganov	f532be8fac	sync : ggml b9610	2026-06-12 15:55:35 +03:00
Georgi Gerganov	e08c226a2c	ggml : bump version to 0.15.1 (ggml/1541)	2026-06-12 15:55:35 +03:00
Adrien Gallouët	70b54e140c	vendor : update cpp-httplib to 0.47.0 (#24395 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co> b9608	2026-06-12 11:34:44 +02:00
Pascal	6471e3c090	UI/jpeg exif orientation (#24196 ) * ui: bake jpeg exif orientation into uploaded images stb_image in mtmd ignores exif metadata, so rotated smartphone photos reach the model with raw pixel orientation. The webui now reads the exif orientation tag at send time and feeds it into the existing capImageDataURLSize canvas pass: the browser applies the rotation when decoding, so capped images come out upright for free, and images under the cap threshold get a single plain redraw when orientation > 1. At most one re-encode ever happens per image. Upright jpegs with capping disabled pass through untouched, bit perfect. Adds jpeg-orientation.ts with a minimal exif parser working on a bounded base64 prefix (both endianness, returns 1 on any malformed input) and unit tests against handcrafted jpeg byte streams. * ui: move jpeg exif constants into lib/constants * ui: add browser test for jpeg orientation and capping Covers capImageDataURLSize end to end in chromium with real Pillow generated jpeg fixtures across exif orientations 1/3/5/6/8: upright quadrant colors checked pixel-wise, expected dimensions with and without capping, no orientation tag left in the output, and strict passthrough when nothing needs rewriting.	2026-06-12 10:20:27 +02:00
Ruixiang Wang	88a39274ec	spec: add EAGLE3 speculative decoding support (#18039 ) * llama : enable layer input extraction * spec: support eagle3 * eagle3: fix params bug * eagle3: support Gemma4 eagle3 from RedHatAI * eagle3: set sync when get features from target Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com> * eagle3 : fix ubatch handling in embd_layer_inp extraction and encoder Co-authored-by: Doğaç Eldenk <dogacel@gmail.com> * eagle3: adapt to upstream changes * eagle3: fix rebase issues and adapt to upstream changes * eagle3:exclude the eagle3 arch from test-llama-archs * eagle3: fix editorconfig check failures * eagle3: fix multi-seq issue in d2t vocab mapping * cont : minor style / clean-up * spec : remove `common_speculative_setup_draft_model()` * llama : clean-up unused API * eagle3: set d2t vocab mapping in decode graph * cont : assert layer inputs are configured * hparams : use n_embd_inp instead of n_embd_target_features * eagle3: make output.weight optional and inherit from target model when needed * haparams : generic norm-before-residual param * llama-ext : consistent names * cont : fix * hparams : remove target_hidden_size * cparams : rename output_layer_inp -> embeddings_layer_inp * arch : reuse ATTN_NORM_2 instead of adding new hidden norm * llama : clean-up names * cont : add assert + comment * Update conversion/llama.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com> Co-authored-by: Doğaç Eldenk <dogacel@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b9606	2026-06-12 10:21:06 +03:00
ZihaoMu	85f99dca8b	ggml: support concat for scalar types at cuda backend (#24011 ) * cuda: support concat for scalar types * Update concat.cu * fix metal ci issue b9605	2026-06-12 09:32:44 +03:00
Neo Zhang	099ea76fb4	[SYCL] Fix CI build & release for SYCL backend (#24387 ) * restore SYCL build and release, remove github cache * modify for test only * verify the ccache is used * remove debug code change * rm duplicate action, update key in ccache * add action ccache-clear after building in both ubuntu and windows * set %NUMBER_OF_PROCESSORS% in widnows build b9604	2026-06-12 09:30:24 +03:00
shaofeiqi	ba1df050f3	opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (#24319 ) * opencl: add q5_0 adreno support * opencl: add q5_1 adreno support * opencl: cosmetic fix --------- Co-authored-by: Li He <lih@qti.qualcomm.com> b9603	2026-06-11 21:43:09 -07:00
wencan	1593d5684d	docker : support specifying the GCC version for CUDA (#24447 )	2026-06-11 23:12:09 +02:00
Jeff Bolz	4c6595503f	vulkan: ifdef eMesaHoneykrisp (build fix) (#24479 ) Fixes build/CI after #24306. b9601	2026-06-11 13:22:17 -05:00
Georgi Gerganov	263cc04a54	sync : ggml	2026-06-11 19:34:19 +03:00
Georgi Gerganov	17e59d6209	ggml : bump version to 0.15.0 (ggml/1539)	2026-06-11 19:34:19 +03:00
Winston Ma	fdc3db9b65	vulkan: add fast path for contiguous buffer transfers (#23973 )	2026-06-11 15:46:25 +02:00
Kevin Liu	1af154a76f	vulkan: use medium matmul tile on Asahi Linux (#24306 ) * vulkan: use medium matmul tile on Asahi Linux * vulkan: switch Apple detection to Honeykrisp driver id	2026-06-11 15:43:04 +02:00

1 2 3 4 5 ...

9646 Commits