Commit Graph

  • acb7c79069 common/parser: handle reasoning budget (#20297) b8287 Piotr Wilkin (ilintar) 2026-03-11 10:26:12 +01:00
  • 5f91b1d5d5 ggml-cuda: gdn use shared mem for HIP (#20366) b8286 uvos 2026-03-11 06:06:19 +01:00
  • 9ef7523ee9 cuda/hip: fix loop unrolling in ssm-conv (#20369) b8285 uvos 2026-03-11 06:04:32 +01:00
  • 00de615345 Fix agentic mcp image single model (#20339) b8284 Pascal 2026-03-11 05:31:33 +01:00
  • e1a399992b vendor : update cpp-httplib to 0.37.0 (#20207) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-11 00:03:53 -03:00
  • 4f2f0a163d vendor : update miniaudio to 0.11.25 (#20209) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-11 00:01:56 -03:00
  • 0cec84f999 fix op rope, add rope_back (#20293) b8281 Neo Zhang 2026-03-11 09:53:34 +08:00
  • b2e1427c9b fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (#20283) b8280 Neo Zhang 2026-03-11 09:53:05 +08:00
  • 4d99d45084 model : qwen3vl reranker text support (#20332) b8279 Vinicios Lugli 2026-03-10 19:40:14 -03:00
  • 10e5b148b0 llama-quant : correct n_attention_wv usage (#20357) b8278 ddh0 2026-03-10 14:43:29 -05:00
  • 90b2731894 ggml : bump RPC version (#20330) b8277 Georgi Gerganov 2026-03-10 21:36:57 +02:00
  • aa2d278a11 ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173) b8276 Reese Levine 2026-03-10 09:14:27 -07:00
  • 6c770d16ca Reduce level of content parser warning message to avoid log spam on non-debug verbosity (#20347) Piotr Wilkin (ilintar) 2026-03-10 15:21:51 +01:00
  • 8d880ac012 examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968) Ray Xu 2026-03-10 21:38:18 +08:00
  • 0f1e9d14cc docs: update CPU backend ops to mark POOL_1D as supported (#20304) a3894281 2026-03-10 15:31:24 +02:00
  • 1274fbee9e models : fix assert in mamba2 (cont) (#20335) b8272 Georgi Gerganov 2026-03-10 15:00:08 +02:00
  • a7b3dee7a5 server : make 2 checkpoints near the end of the prompt (#20288) b8271 Georgi Gerganov 2026-03-10 14:28:23 +02:00
  • ec947d2b16 common : fix incorrect uses of stoul (#20313) b8270 Sigbjørn Skjæret 2026-03-10 11:40:26 +01:00
  • 0cd4f4720b kleidiai : support for concurrent sme and neon kernel execution (#20070) b8269 Charles Xu 2026-03-10 08:25:25 +01:00
  • af237f3026 ggml-cpu: add RVV repack GEMM and GEMV for quantization types (#19121) b8268 Taimur Ahmad 2026-03-10 11:49:52 +05:00
  • 1a5631beaa metal: handle command buffer failures gracefully in synchronize (#20306) b8267 Julian Pscheid 2026-03-09 23:32:24 -07:00
  • 1dab5f5a44 llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770) b8266 ddh0 2026-03-10 01:16:05 -05:00
  • c96f608d98 common: consolidate PEG string parsers (#20263) b8265 Aldehir Rojas 2026-03-09 18:29:21 -05:00
  • 0842b9b465 model: fix step3.5 n_rot (#20318) b8264 Xuan-Son Nguyen 2026-03-09 23:42:24 +01:00
  • 59db9a357d llama: dynamic head_dim and n_rot for SWA (#20301) b8263 Xuan-Son Nguyen 2026-03-09 22:22:39 +01:00
  • 23fbfcb1ad server: Parse port numbers from MCP server URLs in CORS proxy (#20208) b8262 Evan Huus 2026-03-09 12:47:54 -04:00
  • e22cd0aa15 metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250) b8261 Paul Flynn 2026-03-09 10:48:12 -04:00
  • 96cfc4992c server : fix checkpoints n_tokens calculation (#20287) b8260 Georgi Gerganov 2026-03-09 16:47:06 +02:00
  • ed0007aa32 metal : add upscale (#20284) b8259 Georgi Gerganov 2026-03-09 16:45:11 +02:00
  • 344ee2a38a server : warn swa-full is not supported for non-SWA models (#20291) b8258 Georgi Gerganov 2026-03-09 16:44:25 +02:00
  • d6e1556499 server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279) Georgi Gerganov 2026-03-09 16:43:38 +02:00
  • f76565db92 common: map developer role to system (#20215) b8256 Piotr Wilkin (ilintar) 2026-03-09 14:25:11 +01:00
  • 43e1cbd6c1 models : fix assert in mamba2 graph (#20270) b8255 Georgi Gerganov 2026-03-09 13:15:15 +02:00
  • 107d599952 server : add kill switch when server is stuck (#20277) b8254 Georgi Gerganov 2026-03-09 10:33:12 +02:00
  • e8bbc736cb ggml-cuda: disable gdn for musa (#20278) b8253 Aman Gupta 2026-03-09 16:15:36 +08:00
  • b518195101 llama-quant : left-align tensor names in output (#20117) b8252 ddh0 2026-03-09 02:28:41 -05:00
  • e2763a6723 contributing: limit open PRs for new contributors to 1 (#20036) Aman Gupta 2026-03-09 15:05:34 +08:00
  • 0beb8db3a0 ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (#20219) b8250 Bertay Eren 2026-03-09 09:24:16 +03:00
  • b2f460bd3c vulkan: skip zero size tensors in backend copies (#20233) b8249 Ruben Ortlam 2026-03-09 07:23:45 +01:00
  • 5f4cdac385 cuda : display total and free VRAM capacity during device initialization (#20185) b8248 Michael Huang 2026-03-08 21:45:43 -07:00
  • ae87863dc1 llama-bench: introduce -hf and -hff flags & use --mmap 1 by default (#20211) b8247 Aaron Teo 2026-03-09 09:05:44 +08:00
  • 97c64fbdbd PEG parser for LFM2 (#20251) b8246 Piotr Wilkin (ilintar) 2026-03-09 01:11:22 +01:00
  • d417bc43dd server : do not create checkpoints right after mtmd chunks (#20232) b8245 Georgi Gerganov 2026-03-08 22:16:46 +02:00
  • 35bee031e1 graph : remove redundant scale_w parameter (#20235) b8244 Sigbjørn Skjæret 2026-03-08 18:58:28 +01:00
  • 451ef08432 common : gracefully handle incomplete output (#20191) b8243 Aldehir Rojas 2026-03-08 11:17:02 -05:00
  • 9b24886f78 Fix compile bug (#20203) b8242 Piotr Wilkin (ilintar) 2026-03-08 17:15:49 +01:00
  • 62b8143ad2 Fix structured outputs (#20223) b8241 Piotr Wilkin (ilintar) 2026-03-08 17:14:43 +01:00
  • d088d5b74f ggml-vulkan: Add ELU op support (#20183) b8240 GiantPrince 2026-03-08 07:38:17 -04:00
  • cd18a50ea5 vulkan: Fix data races in coopmat1 mul_mat(_id) (#20084) b8239 Jeff Bolz 2026-03-08 06:33:48 -05:00
  • a976ff081b llama: end-to-end tests (#19802) b8238 Johannes Gäßler 2026-03-08 12:30:21 +01:00
  • a95047979a readme : update infra list (#20212) Christopher Maher 2026-03-08 03:42:28 -07:00
  • b283f6d5b3 Revert to OAI-compatible args (#20213) b8236 Piotr Wilkin (ilintar) 2026-03-08 11:33:03 +01:00
  • ff52ee964d server : correct index on finish in OAI completion streams (#20226) b8235 decahedron1 2026-03-08 04:08:57 -05:00
  • 213c4a0b81 [SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190) b8234 Neo Zhang 2026-03-08 12:00:07 +08:00
  • 715ed28683 use scalar sums 0cc4m/vulkan-coopmat-int8 Ruben Ortlam 2026-03-07 22:11:40 +01:00
  • a9435151db apply scales inline Ruben Ortlam 2026-03-07 14:56:25 +01:00
  • d1f8bbd085 vulkan: add int8 coopmat quantized matmul shader Ruben Ortlam 2026-03-07 14:42:41 +01:00
  • c5a778891b ggml: add GATED_DELTA_NET op (#19504) b8233 Aman Gupta 2026-03-07 15:41:10 +08:00
  • 6fce5c6a7d opencl: add l2_norm (#20160) b8232 lhez 2026-03-06 18:03:05 -08:00
  • c024d85908 Autoparser: True streaming (#20177) b8231 Piotr Wilkin (ilintar) 2026-03-07 01:55:33 +01:00
  • 2f2923f895 Autoparser: add optional argument reshuffle capability (#20171) b8230 Piotr Wilkin (ilintar) 2026-03-06 22:34:15 +01:00
  • 649f06481e quants : Add memsets and other fixes for IQ quants (#19861) b8229 Bartowski 2026-03-06 16:06:56 -05:00
  • 7463687161 Add @pwilkin to CODEOWNERS for autoparser code (#20174) Piotr Wilkin (ilintar) 2026-03-06 21:25:41 +01:00
  • 566059a26b Autoparser - complete refactoring of parser architecture (#18675) b8227 Piotr Wilkin (ilintar) 2026-03-06 21:01:00 +01:00
  • 34df42f7be hexagon: add f32 ssm_conv op (#20122) b8226 Todor Boinovski 2026-03-06 09:59:26 -08:00
  • e68f2fb894 server : preserve anthropic thinking blocks in conversion (#20120) b8225 Tom Vaucourt 2026-03-06 17:41:12 +01:00
  • ba2fd11cdf cpu: skip redudant ROPE cache updates (#20149) b8224 Max Krasnyansky 2026-03-06 08:32:40 -08:00
  • d48e876467 ggml-cuda: add mem check for fusion (#19916) b8223 Aman Gupta 2026-03-07 00:05:43 +08:00
  • ba2ff79e43 ggml: update comments for backends which have no memory to report (#20157) b8222 Aaron Teo 2026-03-06 23:24:38 +08:00
  • c6980ff29d ggml-cpu: Fix gcc 15 ICE on ppc64le (#20083) (#20130) b8221 shalinib-ibm 2026-03-06 20:52:39 +05:30
  • 1e38a7a6fa CUDA: use shared mem for ssm_conv (#20128) b8220 Aman Gupta 2026-03-06 23:09:59 +08:00
  • 121fe62182 test pr/19802-test Georgi Gerganov 2026-03-06 16:30:32 +02:00
  • 388baabc06 context: ignore zero scale LoRAs when checking sameness (#20166) b8219 Tim Neumann 2026-03-06 14:05:52 +01:00
  • f5ddcd1696 Checkpoint every n tokens: squash (#20087) b8218 Piotr Wilkin (ilintar) 2026-03-06 11:39:26 +01:00
  • 803d3a1964 fix CI Johannes Gäßler 2026-03-06 10:16:05 +01:00
  • f6235a41ef webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655) Aleksander Grygier 2026-03-06 10:00:39 +01:00
  • b90486e51d fix WebGPU Johannes Gäßler 2026-03-05 22:50:22 +01:00
  • 7e466072f3 fix CI Johannes Gäßler 2026-03-05 15:26:38 +01:00
  • 4b7f407ae8 fix use-after-free in llama-model-loader.cpp Johannes Gäßler 2026-03-04 12:36:09 +01:00
  • e6a6af1cef fixup for rebase Johannes Gäßler 2026-03-03 21:27:02 +01:00
  • fc6960347b tests: add end-to-end tests per model architecture Johannes Gäßler 2026-02-21 11:15:32 +01:00
  • 2850bc6a13 ggml-cpu: fix data race for debug asserts (#20148) b8216 Johannes Gäßler 2026-03-06 09:12:49 +01:00
  • 17a4258946 kv-cache : fix M-RoPE checkpoints (#20132) b8215 Georgi Gerganov 2026-03-06 08:46:51 +02:00
  • f7db3f3789 cli : Don't clear system prompt when using '/clear' (#20067) b8214 Roj234 2026-03-06 13:41:11 +08:00
  • 6c97bffd65 opencl: add neg, exp and diag (#20127) b8213 lhez 2026-03-05 21:16:39 -08:00
  • 2b10b62677 hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) b8212 YardenTal44 2026-03-06 04:29:13 +02:00
  • a0ed91a442 models : kda chunk size = 16 (#19827) ymcki 2026-03-05 23:01:23 +08:00
  • 2cd20b72ed CUDA: Improve performance via less synchronizations between token (#17795) b8210 Andreas Kieslinger 2026-03-05 12:53:21 +01:00
  • 872646b30c model : update Qwen3.5 model type detection (#20126) b8209 Eric Zhang 2026-03-05 19:47:14 +08:00
  • b5ed0e058c cli : add command and file auto-completion (#19985) b8208 Sigbjørn Skjæret 2026-03-05 10:47:28 +01:00
  • cf232515c9 convert : register Qwen 3.5 ForCausalLM for text only (#20119) Sigbjørn Skjæret 2026-03-05 10:30:02 +01:00
  • 5e335ba113 webui: Improvements for Models Selector UI (#20066) Aleksander Grygier 2026-03-05 08:52:22 +01:00
  • 92f7da00b4 chore : correct typos [no ci] (#20041) Marcel Petrick 2026-03-05 08:50:21 +01:00
  • 7a99dc85e2 hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (#20118) b8204 Max Krasnyansky 2026-03-04 21:55:29 -08:00
  • 69fd345335 opencl: add SET, support i32 for CPY, minor refactor for cpy (#20101) b8203 lhez 2026-03-04 21:32:26 -08:00
  • 1a29907d2e hexagon: add llama-completion runner script (#20095) b8202 Todor Boinovski 2026-03-04 15:04:59 -08:00
  • 24d2ee0527 [WebGPU] Fix wait logic for inflight jobs (#20096) b8201 Nikhil Jain 2026-03-04 11:54:55 -08:00
  • 541bf37622 Add concat op to webgpu. (#20068) b8200 Masashi Yoshimura 2026-03-05 04:19:00 +09:00
  • d969e933e1 tools : add missing clocale include in mtmd-cli [no ci] (#20107) Sigbjørn Skjæret 2026-03-04 14:18:04 +01:00
  • 7f5ee54968 ggml: fix ggml_is_contiguous_n for ne == 1 (#20092) b8198 Johannes Gäßler 2026-03-04 12:04:31 +01:00