Commit Graph

  • feefb92836 vulkan: tune MMVQ for Intel Windows (#19988) b8187 Ruben Ortlam 2026-03-02 15:58:25 +01:00
  • ec88c3ceea scripts : improve get-wikitext-2.sh (#19952) Adrien Gallouët 2026-03-02 15:40:49 +01:00
  • 2afcdb9777 ggml-cpu: optimise s390x multiply extend instructions (#20032) b8185 Aaron Teo 2026-03-02 16:23:56 +08:00
  • 319146247e vulkan: improve partial offloading performance on AMD (#19976) b8184 Ruben Ortlam 2026-03-01 17:32:14 +01:00
  • 66d65ec29b cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernels (#19999) b8183 oobabooga 2026-03-01 02:40:22 -03:00
  • 05728db18e vendors : update miniaudio library to 0.11.24 (#19914) b8182 Dmitry Atamanov 2026-02-28 20:10:01 +05:00
  • 4720819d45 vendor : update cpp-httplib to 0.35.0 (#19969) b8181 Adrien Gallouët 2026-02-28 13:53:56 +01:00
  • d979f2b176 tests : model metadata loading from huggingface (#19796) b8180 Bartowski 2026-02-28 04:44:38 -05:00
  • 07e2c9707c eagle3: support --eagle3 in llama-cli ruixiangw 2026-02-28 00:33:54 +00:00
  • ecbcb7ea9d CUDA: add CDNA3 MFMA support for flash attention MMA kernel (#19806) b8179 Jayant Lohia 2026-02-28 00:07:26 +05:30
  • 3e6ab244ad server: Add pragma once to server-context.h (#19944) b8178 Roj234 2026-02-28 01:28:36 +08:00
  • 5596a35791 server: Mirroring /v1/responses to /responses to match /v1/chat/completions pattern (#19873) b8177 Sami Kama 2026-02-27 08:44:42 -08:00
  • 8d3b962f47 ci : use ubuntu-latest for gguf-publish workflow (#19951) Daniel Bevenius 2026-02-27 14:42:24 +01:00
  • d903f30e25 ggml-cpu: add repack for mxfp4 (#19738) b8175 Aman Gupta 2026-02-27 18:15:09 +08:00
  • 8387ffb28d gguf-py : dump version to 0.18.0 (#19950) gguf-v0.18.0 Daniel Bevenius 2026-02-27 11:02:53 +01:00
  • 2e7e638523 server : support multiple model aliases via comma-separated --alias (#19926) b8173 Pascal 2026-02-27 07:05:23 +01:00
  • a8b192b6ec tests : enable test-chat out of tree build (#19558) b8172 Jan Patrick Lehr 2026-02-27 05:37:54 +01:00
  • c17dce4f5c replace the magic nunber 768 by max work group size to support iGPU (#19920) b8171 Neo Zhang 2026-02-27 09:26:07 +08:00
  • 88cf781f51 ggml-zendnn: update code for latest ZenDNN API (#19923) b8170 Vishal Singh 2026-02-27 06:13:41 +05:30
  • 4e76d24f28 ggml : fix AMX and add batched support (#19925) b8169 Adrien Gallouët 2026-02-26 21:39:11 +01:00
  • 723c71064d vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (#19921) b8168 Ruben Ortlam 2026-02-26 19:11:04 +01:00
  • 37964f44f9 mtmd : fix padding of n_tokens (#19930) b8167 Georgi Gerganov 2026-02-26 18:39:49 +02:00
  • 01cd448b8c server : fix ctx checkpoint restore logic (#19924) b8166 Georgi Gerganov 2026-02-26 18:20:16 +02:00
  • 99bd67c9b2 kv-cache : fix can_shift() check to take into account M-RoPE (#19928) b8165 Georgi Gerganov 2026-02-26 18:08:54 +02:00
  • b68d75165a llama: Add option to merge gate and exp weights (#19139) b8164 Aman Gupta 2026-02-26 21:01:08 +08:00
  • ffaafde16f ggml-virtgpu: improve the reliability of the code (#19846) b8163 Kevin Pouget 2026-02-26 13:00:57 +01:00
  • efba35a860 server: fix load-on-startup not respected in ini file (#19897) b8162 drrros 2026-02-26 14:32:31 +03:00
  • 9b62913b40 jinja : correct default size for string slices (#19913) b8161 Eric Zhang 2026-02-26 19:28:09 +08:00
  • 66287bdaac model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826) Maximilian Werk 2026-02-26 12:14:09 +01:00
  • 1ca3d1de15 gguf : avoid too many file size calls (#19919) b8159 Georgi Gerganov 2026-02-26 12:46:32 +02:00
  • bd72300591 server : fix typo in server README.md (#19900) yggdrasil75 2026-02-26 05:26:16 -05:00
  • 2943210c1e support permuted, remove check s0/s10 (#19889) b8157 Neo Zhang 2026-02-26 10:27:20 +08:00
  • 3769fe6eb7 vulkan: check for memory overlap before doing fusion (#19768) b8156 Jeff Bolz 2026-02-25 11:25:38 -06:00
  • 832aa94762 common : add more aliases for sampler CLI params (#19797) b8155 ddh0 2026-02-25 09:34:25 -06:00
  • 3af34b9ff5 ci : update the ROCm/HIP toolchain versions [no ci] (#19891) Slobodan Josic 2026-02-25 15:54:49 +01:00
  • f20469d919 server : enable multi-modal prompt caching (#19877) b8153 Georgi Gerganov 2026-02-25 15:15:42 +02:00
  • d7d826b3c1 server : support multi-modal context checkpoints (#19849) b8152 Georgi Gerganov 2026-02-25 15:14:27 +02:00
  • c747294b2d scripts: update corpus of compare-logprobs (#19326) Xuan-Son Nguyen 2026-02-25 12:57:34 +01:00
  • 8fdf269dad ci : update Windows ROCm build to 26.Q1 [no ci] (#19810) Mario Limonciello 2026-02-25 05:30:19 -06:00
  • a96a1120b4 gguf : fix ftell/fseek for Windows (#19870) b8149 Aldehir Rojas 2026-02-24 22:58:11 -06:00
  • 244641955f models : fix graph splits (#19866) b8148 Georgi Gerganov 2026-02-25 00:01:13 +02:00
  • 47eb12b953 server: fix query params lost when proxying requests in multi-model router mode (#19854) b8147 Pascal 2026-02-24 21:46:06 +01:00
  • 418dea39ce ggml/gguf : prevent integer overflows (#19856) b8146 Georgi Gerganov 2026-02-24 20:17:11 +02:00
  • da426cb250 model : update label for LFM2-24B-A2B (#19848) b8145 Tarek Dakhran 2026-02-24 14:27:42 +01:00
  • c830f99cfa server : support max_completion_tokens request property (#19831) b8144 Radoslav Gerganov 2026-02-24 10:30:00 +02:00
  • aa6f918c1c Vulkan Scalar Flash Attention Refactor (#19625) b8143 Ruben Ortlam 2026-02-24 08:35:48 +01:00
  • 8c2c0108dd vulkan: fix coopmat1 without bf16 support (#19793) b8142 Jeff Bolz 2026-02-24 00:48:32 -06:00
  • 3ea5360c00 vulkan: fix data race in mul_mat_id shader (#19790) b8141 Jeff Bolz 2026-02-24 00:43:12 -06:00
  • 39fb81f875 hexagon refactor all Ops to use local context struct (#19819) b8140 Max Krasnyansky 2026-02-23 16:32:14 -08:00
  • 5eb0ea32f0 feat: Add code blocks full height setting to parameter sync service (#19835) Aleksander Grygier 2026-02-23 22:30:13 +01:00
  • b68a83e641 vendor : update cpp-httplib to 0.34.0 (#19830) b8138 Adrien Gallouët 2026-02-23 21:05:48 +01:00
  • d8aeb65cee tests : fix typos in comments in test-backend-sampler [no ci] (#19824) Daniel Bevenius 2026-02-23 17:12:02 +01:00
  • 9051663d5d webui: Add setting to have full height Code Blocks in Chat Messages (#19829) Aleksander Grygier 2026-02-23 14:16:50 +01:00
  • 72b44c0d21 model-conversion : merge inspect-org-model.py with tensor-info.py (#19823) Daniel Bevenius 2026-02-23 14:15:16 +01:00
  • b8ab2cc559 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-02-23 14:47:03 +02:00
  • bc160d3582 ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod) (#19356) Alberto Cabrera Pérez 2026-02-23 12:42:52 +00:00
  • 4b436e4e5e flake8 fix ci-tmp Sigbjørn Skjæret 2026-02-23 11:48:01 +01:00
  • 2b6dfe824d llama : remove write/read of output ids/logits/embeddings (#18862) b8133 Daniel Bevenius 2026-02-23 07:04:30 +01:00
  • e8e261699a cli : provide model with text filename (#19783) b8132 Sigbjørn Skjæret 2026-02-22 22:33:49 +01:00
  • 5452d736f8 jinja: correct stats for tojson and string filters (#19785) b8131 Xuan-Son Nguyen 2026-02-22 21:08:23 +01:00
  • a6d3e9a239 ggml : relax asseerts for ggml_get_type_traits() Georgi Gerganov 2026-02-22 21:37:58 +02:00
  • 9c5d8dec37 gguf : add file size check for arrays Georgi Gerganov 2026-02-22 21:36:56 +02:00
  • c76408dbb9 gguf : add mem_size overflow test Georgi Gerganov 2026-02-22 18:40:06 +02:00
  • ed4837891d common : fix improper trimming in XML parser on complete message (#19805) b8130 Aldehir Rojas 2026-02-22 10:34:54 -06:00
  • cacc371f99 Fix wrong cli-argument in documentation (#19804) Kilian Krampf 2026-02-22 16:26:33 +01:00
  • ae2368e74e model : add Kanana-2 model support (#19803) b8128 HelloKS 2026-02-23 00:15:02 +09:00
  • 9f0684f003 ci : fix rocm archive name [no ci] (#19808) Sigbjørn Skjæret 2026-02-22 16:14:37 +01:00
  • c79698f28a ggml : relax ggml_type asserts to debug-only Georgi Gerganov 2026-02-22 16:32:39 +02:00
  • 45250db0f8 ggml : remove deprecated ggml_type_sizef() Georgi Gerganov 2026-02-22 16:23:57 +02:00
  • dfac6caa40 ggml : print values when overflow Georgi Gerganov 2026-02-22 16:09:53 +02:00
  • 327e2ca6f2 gguf : minor print fix Georgi Gerganov 2026-02-22 16:01:29 +02:00
  • 09788740f3 gguf : fix ctx size for no_alloc == true Georgi Gerganov 2026-02-22 15:54:44 +02:00
  • 4e89ec67fa gguf : better name Georgi Gerganov 2026-02-22 15:47:01 +02:00
  • 34ec1c3f18 server : merge contiguous Responses input items into a single assistant message (#19773) b8126 Aldehir Rojas 2026-02-22 07:11:31 -06:00
  • 46a9a0656a enforce proper alignment in add_custom_alignment Sigbjørn Skjæret 2026-02-22 10:38:58 +01:00
  • f2ac3ef57e py : restore tensor_fields Georgi Gerganov 2026-02-22 10:54:14 +02:00
  • 12c719b3f1 gguf-py : error on duplicate keys when reading Georgi Gerganov 2026-02-22 09:47:43 +02:00
  • 5d67acd422 ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object Georgi Gerganov 2026-02-22 09:46:54 +02:00
  • e877ad8bd9 ci : fix rocm release path [no ci] (#19784) Sigbjørn Skjæret 2026-02-22 08:07:46 +01:00
  • 35715657cb Update ROCm docker container to 7.2 release (#19418) b8124 Mario Limonciello 2026-02-21 14:53:39 -06:00
  • f75c4e8bf5 Add a build target to generate ROCm artifacts using ROCm 7.2 (#19433) b8123 Mario Limonciello 2026-02-21 12:56:26 -06:00
  • 99156f3a5f vendor : update cpp-httplib to 0.33.1 (#19778) b8122 Adrien Gallouët 2026-02-21 19:12:31 +01:00
  • a0c91e8f9f Improve CUDA graph capture (#19754) b8121 Gaurav Garg 2026-02-21 15:09:36 +05:30
  • 07968d53e4 fix: UI single model selection in router mode (#19767) crsawyer 2026-02-21 02:28:39 -06:00
  • ba3b9c8844 hexagon : fix build release (#19444) (#19587) b8119 Mengsheng Wu 2026-02-20 16:40:00 -08:00
  • 94b0200a01 common : merge qwen3-coder and nemotron nano 3 parsers (#19765) b8118 Aldehir Rojas 2026-02-20 16:22:22 -06:00
  • 9fea2434af eagle3: fix model convert code format ruixiangw 2026-02-20 18:05:49 +00:00
  • b3537924ef eagle3: fix model convert issue ruixiangw 2026-02-20 17:54:08 +00:00
  • b908baf182 ggml-cpu: add RVV vec dot kernels for quantization types (#18784) b8117 Taimur Ahmad 2026-02-20 16:30:07 +05:00
  • 492bc31978 quantize : add --dry-run option (#19526) b8116 ddh0 2026-02-20 02:20:16 -06:00
  • 77d6ae4ac8 test: mul_mat tests with huge batch size (#19519) b8115 Jeff Bolz 2026-02-19 18:08:25 -08:00
  • 10b26ee23a WebUI hide models in router mode (#19374) crsawyer 2026-02-19 15:53:42 -06:00
  • 3dadc88b58 common : fix Step-3.5-Flash format detection and thinking support (#19635) b8113 Jesse Posner 2026-02-19 13:40:52 -08:00
  • 39e4b1dc9b common : fix gpt-oss Jinja error when assistant message has both content and thinking with tool calls (#19704) b8112 abhijitb11 2026-02-19 12:59:20 -08:00
  • 11c325c6e0 ggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support. (#19700) b8111 Masashi Yoshimura 2026-02-20 01:18:30 +09:00
  • 237958db33 model: Add PaddleOCR-VL model support (#18825) b8110 megemini 2026-02-20 00:05:25 +08:00
  • d97dd299a0 py : assert that alignment is non-zero power of 2 Georgi Gerganov 2026-02-19 16:44:43 +02:00
  • 2e23292cfe ggml : fix negative tensor type oob Georgi Gerganov 2026-02-19 16:42:46 +02:00
  • 7babe5fb13 gguf : prevent array elements exhaustion Georgi Gerganov 2026-02-19 16:26:54 +02:00
  • 357b8e50f1 gguf : prevent string exhaustion Georgi Gerganov 2026-02-19 16:08:04 +02:00