Commit Graph

  • 0c58ba3365 rpc : reuse compute graph buffers (#21299) b8646 Radoslav Gerganov 2026-04-03 10:28:09 +03:00
  • 57ace0d612 chat : avoid including json in chat.h (#21306) b8645 Georgi Gerganov 2026-04-03 09:07:59 +03:00
  • 39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332) b8644 Georgi Gerganov 2026-04-03 09:07:01 +03:00
  • f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345) b8643 Vishal Singh 2026-04-03 08:05:15 +05:30
  • 7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066) b8642 Slobodan Josic 2026-04-03 00:59:20 +02:00
  • 5208e2d5ba fix: gemma 4 template (#21326) b8641 Piotr Wilkin (ilintar) 2026-04-02 23:31:02 +02:00
  • 7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112) b8640 Bartowski 2026-04-02 16:53:58 -04:00
  • a1cfb64530 ggml-webgpu: add vectorized flash attention (#20709) b8639 Zheyuan Chen 2026-04-02 10:40:42 -07:00
  • 5803c8d115 tests: allow exporting graph ops from HF file without downloading weights (#21182) b8638 Ruben Ortlam 2026-04-02 18:19:20 +02:00
  • 63f8fe0ef4 model, mtmd: fix gguf conversion for audio/vision mmproj (#21309) b8637 Xuan-Son Nguyen 2026-04-02 17:10:32 +02:00
  • 223373742b common : add commentary rules for gpt-oss-20b (#21286) Aldehir Rojas 2026-04-02 08:59:59 -05:00
  • e15efe007d Relax prefill parser to allow space. (#21240) b8635 Piotr Wilkin (ilintar) 2026-04-02 11:29:11 +02:00
  • 6137c325a1 chat : add Granite 4.0 chat template with correct tool_call role mapping (#20804) b8634 Jesus Talavera 2026-04-02 11:28:56 +02:00
  • 17193cce34 kv-cache : do not quantize SWA KV cache (#21277) Georgi Gerganov 2026-04-02 11:54:05 +03:00
  • d6dac92bfd Ignore Transfer-Encoding header. (#20269) Roger Chen 2026-04-02 16:41:19 +08:00
  • dae2bf41c9 sync : ggml b8631 Georgi Gerganov 2026-04-02 10:38:24 +03:00
  • bc07d55922 ggml : bump version to 0.9.11 (ggml/1456) Georgi Gerganov 2026-04-02 10:37:26 +03:00
  • 4888137b17 sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (#21283) b8629 Neo Zhang 2026-04-02 15:08:32 +08:00
  • fbd441c379 hexagon : add cumsum op support (#21246) b8628 Todor Boinovski 2026-04-01 17:44:02 -07:00
  • c30e012253 contrib : rewrite AGENTS.md, make it more clear about project values (#21270) copilot/compare-kv-shifting-implementation Xuan-Son Nguyen 2026-04-01 23:31:51 +02:00
  • 95a6ebabb2 opencl: fix leak in Adreno q8_0 path (#21212) b8626 lhez 2026-04-01 12:54:58 -07:00
  • 12dbf1da95 server: Bypass API Key validation for WebUI static bundle assets (#21269) b8625 Aleksander Grygier 2026-04-01 21:32:15 +02:00
  • 86221cf6da CUDA: fix FA kernel selection logic (#21271) b8624 Johannes Gäßler 2026-04-01 21:28:19 +02:00
  • 6de97b9d3e kleidiai: add CPU feature detection to CI run script (#20394) Martin Klacer 2026-04-01 18:02:41 +01:00
  • 5a0ed5150a Update Dawn version in WebGPU CI (#20784) Nikhil Jain 2026-04-01 09:53:05 -07:00
  • 8710e5f9b9 hexagon: improve RMS_NORM and DIV accuracy (#21251) Aparna M P 2026-04-01 21:13:08 +05:30
  • 1d6d4cf7a5 fix: tool call parsing for LFM2 and LFM2.5 models (#21242) Jonathan 2026-04-01 07:22:44 -07:00
  • 744c0c7310 llama : rotate activations for better quantization (#21038) Georgi Gerganov 2026-04-01 16:58:01 +03:00
  • 0356e33aaf scripts: add function call test script (#21234) Xuan-Son Nguyen 2026-04-01 15:31:58 +02:00
  • 6422036fcb sync : ggml Georgi Gerganov 2026-04-01 16:02:34 +03:00
  • 296bc0538b ggml : bump version to 0.9.10 (ggml/1454) Georgi Gerganov 2026-04-01 16:01:45 +03:00
  • 6b949d1078 sycl : support nvfp4 type in mul_mat (#21227) Neo Zhang 2026-04-01 18:54:15 +08:00
  • 84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074) Michael Wand 2026-04-01 03:04:58 -07:00
  • e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224) Ettore Di Giacinto 2026-04-01 11:50:17 +02:00
  • 88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238) uvos 2026-04-01 10:21:20 +02:00
  • d43375ff7f ggml : fix RWKV ops thread assignment (#21226) b8611 Georgi Gerganov 2026-04-01 11:10:25 +03:00
  • 2b86e5cae6 ggml-cpu: fix fallback for RVV kernels without zvfh (#21157) b8610 Taimur Ahmad 2026-04-01 13:10:03 +05:00
  • 88458164c7 CUDA: Add Flash Attention Support for Head Dimension 512 (#20998) b8609 Anav Prasad 2026-04-01 07:07:24 +00:00
  • 4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346) b8608 Ed Addario 2026-04-01 06:43:00 +01:00
  • 82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046) b8607 Reese Levine 2026-03-31 22:38:24 -07:00
  • 825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728) b8606 Abhijit Ramesh 2026-03-31 15:38:16 -07:00
  • 0fcb3760b2 fix: Use lower-case proxy headers naming (#21235) b8605 Aleksander Grygier 2026-03-31 17:47:46 +02:00
  • 6307ec07d3 common : cleanup logs and modernize the progress bar (#21215) b8604 Adrien Gallouët 2026-03-31 16:18:00 +02:00
  • 632219af73 CANN: fix multi-thread set_tensor race conditions (#20151) b8603 hipudding 2026-03-31 22:00:51 +08:00
  • 4a00bbfed6 server: (webui) no more gzip compression (#21073) b8602 Xuan-Son Nguyen 2026-03-31 15:44:26 +02:00
  • 624733d631 common : gpt-oss handle builtin and unsolicited tool calls (#21213) b8601 Aldehir Rojas 2026-03-31 06:52:42 -05:00
  • 0b6ff47996 fix: correct misspellings in code comments (#21217) b8600 lainon1 2026-03-31 12:50:51 +01:00
  • eec6f85d7b CI: Enable CPU and Vulkan ARM64 Release (#21207) b8599 Seungmin Kim 2026-03-31 20:02:56 +09:00
  • 9281dd135d sync : ggml b8598 Georgi Gerganov 2026-03-31 13:08:13 +03:00
  • 0be6c7c9ce ggml : bump version to 0.9.9 (ggml/1449) Georgi Gerganov 2026-03-30 18:34:29 +03:00
  • 41361c8599 common : move up common_init() and fix Windows UTF-8 logs (#21176) Adrien Gallouët 2026-03-31 12:53:41 +02:00
  • 62278cedde sycl : enhance fattn perf (#21185) b8595 Neo Zhang 2026-03-31 18:31:50 +08:00
  • 90aa83c6bd common: add bounds check in common_init_result::sampler to prevent segfault on failed model load (#21082) mtmcp 2026-03-31 07:04:42 -03:00
  • fcc2d598c8 fix: include API key in CORS proxy requests for MCP connections (#21193) SATISH K C 2026-03-31 03:52:34 -05:00
  • 4453e77561 server/webui: cleanup dual representation approach, simplify to openai-compat (#21090) Piotr Wilkin (ilintar) 2026-03-31 10:42:06 +02:00
  • 26dac845cc vendor : update BoringSSL to 0.20260327.0 (#21211) b8591 Adrien Gallouët 2026-03-31 09:21:54 +02:00
  • 5ce013cd7e common : Disable backend sampling if reasoning budget is enabled (#21209) b8590 Galunid 2026-03-31 09:14:01 +02:00
  • 2985be3324 update hw info enhance_fa arthw 2026-03-31 09:24:40 +08:00
  • 08f21453ae opencl: add q4_K gemm and gemv kernels for Adreno (#20919) b8589 shaofeiqi 2026-03-30 12:19:16 -07:00
  • 84ae8434d0 CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122) Seungmin Kim 2026-03-31 03:24:37 +09:00
  • ead417f01c jinja : handle empty expressions correctly (#20913) b8587 Zhihao "Zephyr" Yao 2026-03-30 14:08:46 -04:00
  • 64ac9ab66a CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181) b8586 Oliver Simons 2026-03-30 16:20:00 +02:00
  • cad2d3884c rpc : fix misleading error log (#21184) b8585 Radoslav Gerganov 2026-03-30 17:05:11 +03:00
  • 389c7d4955 webui: Fix branching logic on edit message (#21175) Aleksander Grygier 2026-03-30 14:40:50 +02:00
  • 278521c33a llama-model-loader: print warning when using overrides with mmap (#20978) b8583 Aman Gupta 2026-03-30 17:40:17 +08:00
  • e2eb39e81c ci : bump ty to 0.0.26 (#21156) Sigbjørn Skjæret 2026-03-30 09:29:15 +02:00
  • abf9a62161 server: wrap headers for mcp proxy (#21072) b8581 Xuan-Son Nguyen 2026-03-30 08:59:16 +02:00
  • 7c203670f8 add missing ROPE_FACTORS_LONG/SHORT for MiniCPM (#21150) b8580 Sigbjørn Skjæret 2026-03-29 19:45:40 +02:00
  • ec16a072f0 Optimize MOE GEMV kernel for BS > 1. (#20905) b8579 Gaurav Garg 2026-03-29 22:05:18 +05:30
  • f5d1c4179f hexagon: dma optimizations (mostly fixing regressions) (#21137) b8578 Max Krasnyansky 2026-03-29 06:40:13 -07:00
  • 2405d59cb6 devops: including compute-runtime for intel.Dockerfile (#21076) Davi Henrique Linhares 2026-03-29 02:34:03 -03:00
  • afe65aa282 [SYCL] Enhance build script to use half cores to build, avoid OS hang (#21093) b8576 Neo Zhang 2026-03-29 09:02:45 +08:00
  • 65097181e4 fix **/x glob matching (#21129) b8575 Sigbjørn Skjæret 2026-03-28 22:27:38 +01:00
  • 98ae0a0d36 common/parser: fix handling of tool definition with missing properties key (#21128) b8574 Piotr Wilkin (ilintar) 2026-03-28 20:41:32 +01:00
  • 3a14a542f5 common : add character class support to glob_match (#21111) b8573 Sigbjørn Skjæret 2026-03-28 19:57:37 +01:00
  • 968189729f WebUI: Replace illegal nested button elements (#21026) BlueMöhre 2026-03-28 17:57:59 +01:00
  • e397d3885c common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124) b8571 Adrien 2026-03-28 17:55:38 +01:00
  • e6f2ec01ff common : add reasoning_format = none support to gpt-oss (#21094) b8570 Aldehir Rojas 2026-03-28 09:33:39 -05:00
  • edfb440a2f server : fix processing of multiple back-to-back mtmd chunks (#21107) b8569 Georgi Gerganov 2026-03-28 16:27:36 +02:00
  • 3d66da1809 ci : gracefully shut down the server (#21110) Adrien Gallouët 2026-03-28 14:49:57 +01:00
  • 82b703f8bc Document custom default webui preferences in server README (#19771) Woof Dog 2026-03-28 13:19:16 +00:00
  • 51a84efc53 webui: Conversation forking + branching improvements (#21021) Aleksander Grygier 2026-03-28 13:38:15 +01:00
  • b0f0dd3e51 vendor : update cpp-httplib to 0.40.0 (#21100) b8565 Adrien Gallouët 2026-03-28 08:59:44 +01:00
  • 0eb4764182 vulkan: add noncontiguous GLU support (#21081) Ruben Ortlam 2026-03-28 08:44:56 +01:00
  • 1f5d15e665 common/parser: fix reasoning whitespace bugs + extra parser tests (#21085) b8563 Piotr Wilkin (ilintar) 2026-03-28 07:29:26 +01:00
  • c46758d28f cli : add /glob command (#21084) b8562 Sigbjørn Skjæret 2026-03-28 02:33:04 +01:00
  • bf934f28db docker : fix and enable ARM64 image build (#20929) Ts-sound 2026-03-28 08:45:09 +08:00
  • 5c1a7b8355 server : add custom socket options to disable SO_REUSEPORT (#21056) b8560 Adrien Gallouët 2026-03-28 01:12:43 +01:00
  • f0fea264b0 cont : rand hadamard matrices gg/attn-rot-rand Georgi Gerganov 2026-03-27 20:11:47 +02:00
  • 59d840209a common : inhibit lazy grammar sampler while reasoning is active (#20970) b8559 Aldehir Rojas 2026-03-27 12:30:40 -05:00
  • ff934e29bc server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui (#20158) b8558 Kusha Gharahi 2026-03-27 11:25:55 -05:00
  • ee051c1e4e hexagon: support for IQ4_NL and MXFP4 (#21018) b8557 Yiwei Shao 2026-03-27 09:22:41 -07:00
  • e6f6770515 webui: Improve Chat Messages initial scroll + auto-scroll logic + add lazy loading with transitions to content blocks (#20999) Aleksander Grygier 2026-03-27 17:01:36 +01:00
  • ff76c6731d cont : cache shift support gg/attn-rot-wip Georgi Gerganov 2026-03-27 14:39:14 +02:00
  • 7711b3a36a cont : rotate caches separately + support non-power-of-2 head sizes Georgi Gerganov 2026-03-27 13:56:22 +02:00
  • 48cda24c11 server: remove the verbose_prompt parameter (#21059) b8555 AN Long 2026-03-27 19:36:13 +08:00
  • 871f1a2d2f mtmd: add more sanity checks (#21047) b8554 Xuan-Son Nguyen 2026-03-27 11:00:52 +01:00
  • 832e32639f cont : rotate V more + refactor Georgi Gerganov 2026-03-27 11:29:16 +02:00
  • 20197b6fe3 server: add built-in tools backend support (#20898) b8553 Xuan-Son Nguyen 2026-03-27 10:07:11 +01:00
  • ba38f3becc rpc : proper handling of data pointers to CPU buffers (#21030) b8552 Radoslav Gerganov 2026-03-27 10:59:35 +02:00