Commit Graph

  • c34b92235b fix sycl links in release notes (#24527) Muhammad Salem 2026-06-13 03:37:55 +03:00
  • e37abd6b5f mtmd: add batching API (#24384) Xuan-Son Nguyen 2026-06-13 00:10:29 +02:00
  • f58bad4137 ci : unbreak release harder (#24545) b9616 Sigbjørn Skjæret 2026-06-12 23:49:36 +02:00
  • cd5044661c ci : unbreak release (#24544) Sigbjørn Skjæret 2026-06-12 22:29:49 +02:00
  • 3518061868 fit : wrap llama_device_memory_data gg/fit-wrap-dmd Georgi Gerganov 2026-06-12 18:12:24 +03:00
  • ebc10770ac server : fix reasoning budget WebUI precedence over model.ini (#24517) Georgi Gerganov 2026-06-12 17:59:56 +03:00
  • 3e7bd4f39a vulkan: add pipeline barriers for memcpy read operations (#23770) Ruben Ortlam 2026-06-12 16:43:50 +02:00
  • 9c1d7406b6 Revert "submit only twice for graph reuse" Ruben Ortlam 2026-06-12 16:43:11 +02:00
  • e218a39018 submit only twice for graph reuse Ruben Ortlam 2026-06-10 14:23:35 +02:00
  • ccceabc031 vulkan: capture and replay command buffers where possible Ruben Ortlam 2026-05-07 11:18:23 +02:00
  • f7ca93d12c ui: PWA support (#23871) Aleksander Grygier 2026-06-12 15:53:26 +02:00
  • 02182fc5b9 fit : avoid including llama-ext.h in fit.h (#24506) b9611 Georgi Gerganov 2026-06-12 15:57:05 +03:00
  • f532be8fac sync : ggml b9610 Georgi Gerganov 2026-06-12 15:55:01 +03:00
  • e08c226a2c ggml : bump version to 0.15.1 (ggml/1541) Georgi Gerganov 2026-06-12 15:32:00 +03:00
  • 70b54e140c vendor : update cpp-httplib to 0.47.0 (#24395) b9608 Adrien Gallouët 2026-06-12 11:34:44 +02:00
  • 6471e3c090 UI/jpeg exif orientation (#24196) Pascal 2026-06-12 10:20:27 +02:00
  • 88a39274ec spec: add EAGLE3 speculative decoding support (#18039) b9606 Ruixiang Wang 2026-06-12 09:21:06 +02:00
  • 85f99dca8b ggml: support concat for scalar types at cuda backend (#24011) b9605 ZihaoMu 2026-06-12 14:32:44 +08:00
  • 099ea76fb4 [SYCL] Fix CI build & release for SYCL backend (#24387) b9604 Neo Zhang 2026-06-12 14:30:24 +08:00
  • ba1df050f3 opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (#24319) b9603 shaofeiqi 2026-06-11 21:43:09 -07:00
  • 1593d5684d docker : support specifying the GCC version for CUDA (#24447) wencan 2026-06-12 05:12:09 +08:00
  • 4c6595503f vulkan: ifdef eMesaHoneykrisp (build fix) (#24479) b9601 Jeff Bolz 2026-06-11 13:22:17 -05:00
  • 263cc04a54 sync : ggml Georgi Gerganov 2026-06-11 19:33:33 +03:00
  • 17e59d6209 ggml : bump version to 0.15.0 (ggml/1539) Georgi Gerganov 2026-06-11 19:32:38 +03:00
  • fdc3db9b65 vulkan: add fast path for contiguous buffer transfers (#23973) Winston Ma 2026-06-11 21:46:25 +08:00
  • 1af154a76f vulkan: use medium matmul tile on Asahi Linux (#24306) Kevin Liu 2026-06-11 09:43:04 -04:00
  • 18ef86ecec server: skip unused log lines on router mode (#24463) b9596 Xuan-Son Nguyen 2026-06-11 11:36:35 +02:00
  • 1bfbdb134e vocab : adopt leading TemplateProcessing special token as BOS (#24428) o7si 2026-06-11 15:37:23 +08:00
  • 68f30663cf vocab : refactor normalizer flags into options struct, add strip_accents (#24371) b9594 o7si 2026-06-11 15:36:50 +08:00
  • db94854ff5 server : skip checkpoints beyond pos_next (#24411) Aldehir Rojas 2026-06-11 02:18:12 -05:00
  • ac4cddeb0d vendor : update LibreSSL to 4.3.2 (#24397) b9592 Adrien Gallouët 2026-06-10 22:28:03 +02:00
  • e95dae18d6 Remove padding and multiple D2D copies for MTP (#24086) b9591 Gaurav Garg 2026-06-10 23:21:16 +05:30
  • d2462f8f7a chat: fix LFM2/LFM2.5 ignoring json_schema (#24377) b9590 Tarek Dakhran 2026-06-10 14:41:41 +02:00
  • fb83cc9a07 CUDA: Fix ssm_scan_f32 data-races (#24360) b9589 Oliver Simons 2026-06-10 14:27:08 +02:00
  • 039e20a2db ci : bump komac version (#24396) Sigbjørn Skjæret 2026-06-10 09:45:20 +02:00
  • 41f049a840 Revert "speculative : fix "ngram-map-k4v" name in logging (#24253)" revert-24253-ngram-map-k-name-fix Piotr Wilkin (ilintar) 2026-06-10 09:31:42 +02:00
  • d2e22ed975 speculative : fix "ngram-map-k4v" name in logging (#24253) b9587 ddh0 2026-06-10 02:31:35 -05:00
  • 76da2450a4 webui: implement pinned conversations support (#21387) b9586 Rémy Mathieu 2026-06-09 21:33:22 +02:00
  • d73cd07674 graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357) b9585 Aarnav Pai 2026-06-09 23:16:27 +05:30
  • e25a32e98c ci : fix windows release (#24369) b9584 Sigbjørn Skjæret 2026-06-09 18:42:23 +02:00
  • 483609509d ui: add opt-in run_javascript frontend tool (#24244) Pascal 2026-06-09 18:02:31 +02:00
  • 49f3542190 mtmd: build_vit batching (#24352) Saba Fallah 2026-06-09 16:32:08 +02:00
  • 6c2cbc4e33 vulkan: disable FA mask_opt on GCN to improve performance 0cc4m/vulkan-fa-mask-opt-gcn Ruben Ortlam 2026-06-09 15:40:07 +02:00
  • b6cf9cd8fe mtmd, llama: shared backend sched xsn/mtmd_shared_sched Xuan Son Nguyen 2026-06-09 15:34:17 +02:00
  • d6d0ce8215 vulkan: reduce iq1 shared memory usage for mul_mm (#24287) b9581 Jeff Bolz 2026-06-09 06:27:38 -05:00
  • b4e3dc613b vulkan: add v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention (#24123) b9580 Ruben Ortlam 2026-06-09 13:27:04 +02:00
  • ae735b1314 ui: Fix excessive style recalculation on hover (#24243) Nick Towle 2026-06-09 03:52:20 -07:00
  • 9682e351b8 mtmd: refactor video subproc handling (#24316) b9578 Xuan-Son Nguyen 2026-06-09 12:15:12 +02:00
  • 1e912561dd server: log prompts to directory (#22031) b9577 jacekpoplawski 2026-06-09 12:09:07 +02:00
  • efbacf8d21 ui: fix mobile chat form overflow and bust stale bundle cache (#24158) Pascal 2026-06-09 11:12:58 +02:00
  • 26021699bc ggml : add GGML_OP_COL2IM_1D (#24206) b9575 Pascal 2026-06-09 11:01:37 +02:00
  • 961e9a3e46 server : do not clear slots without unified KV cache (#24190) b9574 fiesh 2026-06-09 09:45:16 +02:00
  • f0152efe40 models : fix plamo2 attention_key/value_length regression (#24317) b9573 Sigbjørn Skjæret 2026-06-09 09:26:44 +02:00
  • fd3271e0b4 ggml-cpu : fix rms_norm_back wrong output under in-place aliasing (#24305) b9572 Yash Raj Pandey 2026-06-09 03:24:27 -04:00
  • e3471b3e73 Remove case for GGML_TYPE_Q4_K in mvvq.cu (#23528) b9571 ravel7524 2026-06-09 07:46:23 +02:00
  • 3ac3c20c96 ggml-webgpu: Add clang-format job (#24308) b9570 Reese Levine 2026-06-08 20:54:24 -07:00
  • 1e1aca09da ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants (#24225) Masashi Yoshimura 2026-06-09 07:19:56 +09:00
  • 7d2b45b4f7 mtp: support for gemma-4 E2B and E4B assistants (#24282) b9568 Max Krasnyansky 2026-06-08 13:48:52 -07:00
  • 9eb4e9dbb7 nits xsn/video_args Xuan Son Nguyen 2026-06-08 22:13:07 +02:00
  • c21dcd8bda gen docs Xuan Son Nguyen 2026-06-08 22:12:24 +02:00
  • e948fee3fd args: add --video-* CLI arguments Xuan Son Nguyen 2026-06-08 22:10:13 +02:00
  • 42a0afd594 server : do not parse when flushing http headers (#24281) b9567 Aldehir Rojas 2026-06-08 13:32:41 -05:00
  • a66d50588b graph: guard iswa kq_mask on its own buffer (#24294) b9566 Pascal 2026-06-08 19:20:28 +02:00
  • 1705d434f6 [ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator (#24000) b9565 Nikhil Jain 2026-06-08 08:07:31 -07:00
  • 3b3da01dc2 [ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops (#24044) b9564 Nikhil Jain 2026-06-08 08:07:15 -07:00
  • 3ebe862b5d docker: install ffmpeg in the released image (#24302) b9563 Xuan-Son Nguyen 2026-06-08 16:59:57 +02:00
  • de396e8790 nits (2) Xuan Son Nguyen 2026-06-08 14:05:24 +02:00
  • 2afe34a58c nits Xuan Son Nguyen 2026-06-08 14:02:29 +02:00
  • 93e126aa08 wire up input_video, accept raw base64 Xuan Son Nguyen 2026-06-08 13:59:14 +02:00
  • 7705270eec Merge branch 'master' into xsn/server_input_file_schema Xuan Son Nguyen 2026-06-08 13:43:32 +02:00
  • 8f83d6c271 mtmd : add video input support (#24269) b9562 Xuan-Son Nguyen 2026-06-08 13:40:12 +02:00
  • c2b1518fd4 sync : ggml b9561 Georgi Gerganov 2026-06-08 12:56:07 +03:00
  • 6a1de6fbf1 ggml : bump version to 0.14.0 (ggml/1533) Georgi Gerganov 2026-06-08 12:51:59 +03:00
  • 1458c8e581 server: refactor/generalize input file schema Xuan Son Nguyen 2026-06-08 13:07:26 +02:00
  • 715b86a366 cli: fix spinner not show during prompt processing (#24283) b9559 Xuan-Son Nguyen 2026-06-08 11:11:45 +02:00
  • c74759a244 vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991) b9558 Jeff Bolz 2026-06-08 03:40:37 -05:00
  • 0f7fada56b cuda: reset cuda context after reading memory size (#23935) b9557 Ruben Ortlam 2026-06-08 10:22:44 +02:00
  • 19bba67c1f HIP: add gfx1152 and gfx1153 to RDNA3.5 (#24129) b9556 Harkirat Gill 2026-06-08 02:33:23 -04:00
  • daf6bc9f2d metal : fix im2col 1D case (audio models) (#24220) b9555 Xuan-Son Nguyen 2026-06-08 08:03:18 +02:00
  • d403f00ec3 [SYCL] Update compute runtime version to 26.x in docker (#24070) b9554 Neo Zhang 2026-06-08 10:35:18 +08:00
  • 9e3b928fd8 common : relax sampler name matching (#23744) b9553 ddh0 2026-06-07 15:48:11 -05:00
  • 8a963fc10e convert : fix conversion for Mistral-Medium-3.5-128B (#24268) David Friehs 2026-06-07 21:41:39 +02:00
  • 379ac6673b kv-cache : avoid kv cells copies (#24277) b9551 Georgi Gerganov 2026-06-07 21:42:54 +03:00
  • f0156d1401 kv-cache: follow the source cache size when sharing cells (#24267) b9550 Pascal 2026-06-07 17:33:00 +02:00
  • 04eb4c446d llama : add Gemma4 MTP (#23398) b9549 Aman Gupta 2026-06-07 20:50:54 +08:00
  • 8a091c47ab spec : fix vocab compatibility check (#24256) b9548 Sigbjørn Skjæret 2026-06-07 13:43:52 +02:00
  • 465b1f0e75 arg: Skip mmproj download when user supplied mmproj (#24239) b9547 konradmb 2026-06-07 11:18:44 +02:00
  • f71af352a5 convert : fix Gemma4 with no audio encoder (#24242) Sigbjørn Skjæret 2026-06-07 08:43:05 +02:00
  • 3f7c79d7b5 docker : bump cuda13 to 13.3.0 (#24228) Sigbjørn Skjæret 2026-06-07 08:31:58 +02:00
  • 98d5e8ba8a common/chat : fix LFM2/LFM2.5 reasoning round-trip and <think> leak (#24234) b9544 Tarek Dakhran 2026-06-06 22:39:21 +02:00
  • 22634e0eee Add tensor name to JSON output cross-profiler Piotr Wilkin 2026-06-06 22:33:01 +02:00
  • 2bfe4ff9ca tentative Metal support Piotr Wilkin 2026-05-19 11:52:22 +02:00
  • 28ef941775 Add missing unrolls Piotr Wilkin 2026-05-16 15:47:06 +02:00
  • 5ef996bd6a Revert accidental change. Piotr Wilkin 2026-05-13 17:22:19 +02:00
  • 1e47576c36 Fix braces Piotr Wilkin 2026-05-13 11:09:53 +02:00
  • 1b9b3e6489 Fix FATTN profiling Piotr Wilkin 2026-05-12 23:58:28 +02:00
  • 56f349fdd7 Converge implementation with export-graph-ops Piotr Wilkin 2026-04-07 22:01:00 +02:00
  • 041605fdc9 Add missing op parameters to the profiler; add support for test-backend-ops to run performance tests with exactly the tensor shapes from the run Piotr Wilkin 2026-04-03 17:41:57 +02:00
  • 3f00bcd871 docs, pass copy details Piotr Wilkin 2026-03-29 23:35:38 +02:00
  • 61bb65d9c9 fix mul_mat_id stats, add throughput stat, add envvar trigger, add concurrent mode fix Piotr Wilkin 2026-03-29 22:52:33 +02:00