Commit Graph

  • cf45437d35 codeowners : use teams (#20526) Sigbjørn Skjæret 2026-03-15 14:26:10 +01:00
  • 9cd4ebcfb1 ci : split build.yml + server.yml (#20546) b8358 Georgi Gerganov 2026-03-15 15:11:17 +02:00
  • 89d0aec042 convert : support contiguous method on lora tensors (#20489) Sigbjørn Skjæret 2026-03-15 12:15:12 +01:00
  • 15324f905b cont : reduce paths gg/ci-build-cro Sigbjørn Skjæret 2026-03-15 13:03:20 +02:00
  • 3fec0e1b86 cont : split server.yml Georgi Gerganov 2026-03-15 12:02:24 +02:00
  • 45a8ab2e0e ci : split build.yml Georgi Gerganov 2026-03-14 16:30:07 +02:00
  • b9da4444df ggml : guard against sumq2 being 0 in IQ4_NL (#20460) b8356 Bartowski 2026-03-15 04:47:28 -04:00
  • 0776a6a039 remove event pending stage 0cc4m/vulkan-async-fixes2 Ruben Ortlam 2026-03-15 08:59:39 +01:00
  • 617db241aa cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478) b8355 PikaPikachu 2026-03-15 15:33:39 +08:00
  • 1a3d8edbba vulkan: use graphics queue on AMD (#20551) b8354 Ruben Ortlam 2026-03-15 08:18:54 +01:00
  • 6b10a82c00 kv-cache : fix reading llama_kv_cell_ext during state read (#20273) b8353 sprayandwipe 2026-03-15 07:11:19 +00:00
  • d23355afc3 model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506) b8352 Michael Wand 2026-03-14 14:44:42 -07:00
  • b30a5fdf37 metal : add FA specialization for HSK = 320, HSV = 256 (#20549) b8351 Georgi Gerganov 2026-03-14 23:15:47 +02:00
  • b4768955c4 ci : move self-hosted workflows to separate files (#20540) b8350 Georgi Gerganov 2026-03-14 23:15:35 +02:00
  • fc350fdf96 docker : force Python 3.13 in Vulkan container (#20530) Gerard Guillemas Martos 2026-03-14 21:37:09 +01:00
  • 3a6f059909 ci : try to optimize some jobs (#20521) b8348 Eve 2026-03-14 19:27:52 +00:00
  • 609ea50026 hexagon: Q4_0 and MXFP4 repack fixes (#20527) b8347 Max Krasnyansky 2026-03-14 11:09:08 -07:00
  • 9f774e45ee ci : reduce webgpu tests timeout to 900s (#20538) Georgi Gerganov 2026-03-14 17:08:26 +02:00
  • 94d0262277 mtmd: add llama-mtmd-debug binary (#20508) Xuan-Son Nguyen 2026-03-14 15:52:29 +01:00
  • a93c0ef0fa add op gated_delta_net (#20455) Neo Zhang 2026-03-14 22:01:57 +08:00
  • 710878a7dd webui: restore code preview iframe origin isolation (#20477) Chedrian07 2026-03-14 19:28:28 +09:00
  • 0685848bc6 scripts : remove get-wikitext-103.sh (#20543) Adrien Gallouët 2026-03-14 11:22:04 +01:00
  • 0024a69b70 scripts : update get-hellaswag.sh and get-winogrande.sh (#20542) Adrien Gallouët 2026-03-14 11:21:50 +01:00
  • d0b79aaa2f ggml : add native AVX512-FP16 support for F16 operations (#20529) b8340 Adrien Gallouët 2026-03-14 10:06:14 +01:00
  • 937a425600 fix log Ruben Ortlam 2026-03-14 08:45:28 +00:00
  • 5c177a1036 fix event reuse issue with multiple vectors Ruben Ortlam 2026-03-14 09:03:29 +01:00
  • f2c0dfb739 Use fp32 in cuBLAS V100 to avoid overflows, env variables to override cuBLAS compute type (#19959) b8339 Wallentri 2026-03-14 10:43:13 +03:00
  • ccd8d4a6ce NO MERGE: sync logging Ruben Ortlam 2026-03-13 14:20:47 +01:00
  • 9789c4ecdc ggml : add OpenVINO backend (#15307) b8338 Zijun Yu 2026-03-14 13:56:55 +08:00
  • 77e20cc107 vendor : update cpp-httplib to 0.37.2 (#20484) b8337 Adrien Gallouët 2026-03-14 06:51:02 +01:00
  • 4374b5ab9a use multiple events to avoid reset issues Ruben Ortlam 2026-03-14 06:39:20 +01:00
  • 5a32a9b8a5 Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507) b8336 Rail Chabdarov 2026-03-14 06:19:44 +01:00
  • 3b439504ba opencl: fix l2_norm (#20480) lhez 2026-03-13 22:18:52 -07:00
  • 463b6a963c tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954) b8334 Adrien Gallouët 2026-03-13 21:25:57 +01:00
  • e30f1fdf74 graph : remove redundant GDN state transposes (#20443) b8333 Georgi Gerganov 2026-03-13 22:12:54 +02:00
  • 1430c35948 common/parser: gracefully handle undetected tool parser, print error message. (#20286) b8332 Piotr Wilkin (ilintar) 2026-03-13 20:56:10 +01:00
  • f17b3be63f llama : fix pooling assertion crash in chunked GDN detection path (#20468) b8331 ZeroV0LT 2026-03-13 19:53:42 +01:00
  • d7ba99c485 server: reset counter related to kill-switch on client error (#20513) b8330 SoftwareRenderer 2026-03-13 13:58:09 -04:00
  • eebf21c3e9 don't use initializer list for semaphore wait info Ruben Ortlam 2026-03-13 17:14:47 +01:00
  • fbaa95bc29 ggml-cpu: add RVV vec dot kernels for quantization types (#18859) b8329 rehan-10xengineer 2026-03-13 20:36:04 +05:00
  • 08a4ba6f03 use timeline semaphores instead of fences for event_synchronize Ruben Ortlam 2026-03-13 16:02:51 +01:00
  • b5e1212063 ggml : fix typo gmml (#20512) b8328 Adrien Gallouët 2026-03-13 14:36:13 +01:00
  • 2204bcedc8 also reset command buffers before reuse Ruben Ortlam 2026-03-13 13:53:23 +01:00
  • c0d100e0fc fix event command buffer reset validation error Ruben Ortlam 2026-03-13 13:49:31 +01:00
  • 58deae173e vulkan: fix event wait submission, event command buffer reset Ruben Ortlam 2026-03-13 13:40:53 +01:00
  • 8f974d2392 mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105) b8327 Daniel Bevenius 2026-03-13 12:30:02 +01:00
  • 2948e6049a general: CONTRIBUTING.md - guidelines for quantization schemes (#19762) Piotr Wilkin (ilintar) 2026-03-13 12:21:33 +01:00
  • 73c9eb8ced metal : fix l2 norm scale (#20493) b8325 Georgi Gerganov 2026-03-13 11:43:20 +02:00
  • 5ec6569eb5 unify scalar+vector and fix reduce function 0cc4m/vulkan-slang-flash-attention Ruben Ortlam 2026-03-13 09:23:03 +01:00
  • e880cb2e0d Revert "move kv shmem staging to function" Ruben Ortlam 2026-03-13 08:18:33 +01:00
  • 0349025db8 move kv shmem staging to function Ruben Ortlam 2026-03-09 15:02:25 +01:00
  • 2c623bfaea generic reductions Ruben Ortlam 2026-03-06 08:09:56 +01:00
  • e1b40fa53a fix slang issues Ruben Ortlam 2026-03-05 11:35:45 +01:00
  • a4ac1d903a vulkan: port Flash Attention shader to Slang Ruben Ortlam 2026-03-04 12:29:09 +01:00
  • 983df142a9 convert : fix/suppress pyright errors (#20442) Daniel Bevenius 2026-03-13 06:00:52 +01:00
  • 57819b8d4b llama : disable graph reuse with pipeline parallelism (#20463) b8323 Georgi Gerganov 2026-03-12 21:04:13 +02:00
  • 95ae9982d3 Merge branch 'master' into compilade/imatrix-neutral-prior compilade/imatrix-neutral-prior Francis Couture-Harpin 2026-03-12 13:20:00 -04:00
  • f45b59a5c3 Revert "quantize : assume the neutral prior is equal imatrix weights" Francis Couture-Harpin 2026-03-12 13:11:53 -04:00
  • 7ded1269ab unify matmul_id shader selection 0cc4m/vulkan-mm-pipeline-selection Ruben Ortlam 2026-03-12 14:55:12 +01:00
  • 664dfc7730 vulkan: unify matmul shader selection Ruben Ortlam 2026-03-12 14:47:18 +01:00
  • 557fe2d913 vendor : update cpp-httplib to 0.37.1 (#20390) b8322 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-12 09:57:06 -03:00
  • 0e810413bb tests : use reasoning instead of reasoning_budget in server tests (#20432) Piotr Wilkin (ilintar) 2026-03-12 13:41:01 +01:00
  • 128142fe7d test-backend-ops: allow loading tests from file and parsing model operators into file (#19896) b8320 Ruben Ortlam 2026-03-12 13:26:00 +01:00
  • 6de1bc631d common : update completion executables list [no ci] (#19934) Daniel Bevenius 2026-03-12 12:12:01 +01:00
  • 0a10c34dc1 grammar: Fix grammar root symbol check (#19761) b8318 Asbjørn Olling 2026-03-12 12:04:56 +01:00
  • deee23863b vulkan: add GATED_DELTA_NET op support (#20334) b8317 ProgenyAlpha 2026-03-12 06:32:04 -04:00
  • c3e3f9e533 convert : better mtp check and fix return [no ci] (#20419) Sigbjørn Skjæret 2026-03-12 10:04:20 +01:00
  • 40c550d4f6 vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379) b8315 ProgenyAlpha 2026-03-12 05:03:18 -04:00
  • de190154c8 New conversations now auto-select the first loaded model (#20403) Pascal 2026-03-12 09:07:05 +01:00
  • 05039967da ggml-virtgpu: Fix some build commands (#20341) Masashi Yoshimura 2026-03-12 16:47:45 +09:00
  • e4cff0956b metal : avoid divisions in bin kernel (#20426) b8312 Georgi Gerganov 2026-03-12 09:42:40 +02:00
  • 4cc6eb158c ci: Setup self-hosted CI for Intel Linux Vulkan backend (#20154) Masato Nakasaka 2026-03-11 22:43:22 -07:00
  • 246ffc4b05 vulkan: fix l2_norm epsilon handling (#20350) b8310 Jeff Bolz 2026-03-12 00:39:41 -05:00
  • aa429cf507 vulkan: fix OOB check in flash_attn_mask_opt (#20296) b8309 Jeff Bolz 2026-03-12 00:35:49 -05:00
  • 5866e3bbc8 vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (#20059) b8308 Masato Nakasaka 2026-03-11 22:30:16 -07:00
  • 0516e04bf9 opencl: use larger workgroup size for get_rows (#20316) lhez 2026-03-11 22:03:27 -07:00
  • 3d9ab225e7 opencl: add cumsum op (#18981) shaofeiqi 2026-03-11 22:03:07 -07:00
  • d63aa398de hip: compile debug builds with -O2 on hip to avoid a compiler bug (#20392) b8305 uvos 2026-03-12 03:37:10 +01:00
  • a8304b4d27 common/parser: add GigaChatV3/3.1 models support (#19931) b8304 Mishusha 2026-03-12 03:22:25 +03:00
  • fdb17643d3 model : add support for Phi4ForCausalLMV (#20168) b8303 DAN™ 2026-03-11 19:25:54 -04:00
  • 1eea6a2968 graph : add optional scale parameter to build_lora_mm [no ci] (#20427) Richard Davison 2026-03-12 00:22:49 +01:00
  • 4a748b8f15 common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (#20416) b8301 ddh0 2026-03-11 18:13:28 -05:00
  • f2ab047f27 ggml-webgpu: Add supports for GGML_OP_REPEAT (#20230) b8300 Masashi Yoshimura 2026-03-12 06:40:36 +09:00
  • 20fbf04cd6 metal : fix capture_started flag gg/metal-bin-mod Georgi Gerganov 2026-03-11 23:15:16 +02:00
  • a71b566137 metal : avoid modulus in bin kernel when not broadcasting Georgi Gerganov 2026-03-11 19:33:03 +02:00
  • d28961d81e llama : enable chunked fused GDN path (#20340) b8299 Georgi Gerganov 2026-03-11 22:46:40 +02:00
  • f90bd1dd84 llama : whitespace cleanup (#20422) b8298 Sigbjørn Skjæret 2026-03-11 21:18:29 +01:00
  • 5eae9cb1d9 ggml : add NVFP4 quantization type support (#19769) b8297 Richard Davison 2026-03-11 21:02:54 +01:00
  • 3ca19b0e9f benches : add nemotron super (#20420) Georgi Gerganov 2026-03-11 21:39:40 +02:00
  • eaf1d7930c llama : add support for Nemotron 3 Super (#20411) b8295 Daniel Bevenius 2026-03-11 19:27:53 +01:00
  • 76ea1c1c46 metal : fix capture_compute counter logic (#20410) Georgi Gerganov 2026-03-11 18:38:22 +02:00
  • 8c8544f9fb metal : fix capture_compute counter logic gg/metal-capture-env-cont Georgi Gerganov 2026-03-11 18:35:45 +02:00
  • bd1ec818e9 compare-llama-bench: check remotes as well (#20406) Aman Gupta 2026-03-12 00:14:42 +08:00
  • b1f856af72 kleidiai: revert unrelated requirements change Martin Klacer 2026-03-11 15:24:43 +00:00
  • b541241104 metal : fix q5_k mul_mv register spill (#20399) b8292 Georgi Gerganov 2026-03-11 16:25:27 +02:00
  • c363256839 metal : add env var to trigger graph capture (#20398) b8291 Georgi Gerganov 2026-03-11 16:25:10 +02:00
  • ecac98ee53 [SYCL] Update SYCL.md for binary package for Windows (#20401) Neo Zhang 2026-03-11 22:21:22 +08:00
  • 182acfe5c5 ci: disable coopmat on ubuntu-24-cmake-vulkan job (#20294) Ruben Ortlam 2026-03-11 14:12:29 +01:00
  • db8ea663c7 kleidiai: add cpu feature detection to CI run script Martin Klacer 2026-03-06 16:45:01 +00:00
  • b5fe4559ae common/parser: use nlohmann::ordered_json to preserve parameter order (#20385) Aldehir Rojas 2026-03-11 04:26:51 -05:00