Commit Graph

  • 06c1e4abc1 readme : add list of dependencies and their license (#13591) Xuan-Son Nguyen 2025-05-16 20:04:18 +02:00
  • 415e40a357 releases : use arm version of curl for arm releases (#13592) b5406 Diego Devesa 2025-05-16 10:36:51 -07:00
  • 654a67794f metal : add FA-vec kernel for head size 64 (#13583) b5405 Georgi Gerganov 2025-05-16 20:32:58 +03:00
  • 5364ae4ba5 llama : print hint when loading a model when no backends are loaded (#13589) b5404 Diego Devesa 2025-05-16 07:38:07 -07:00
  • 7c07ac244d ci : add ppc64el to build-linux-cross (#13575) Sigbjørn Skjæret 2025-05-16 14:54:23 +02:00
  • 0a338ed013 sycl : fixed compilation warnings (#13582) b5402 Łukasz Ślusarczyk 2025-05-16 12:15:29 +02:00
  • bc098c3cf0 minja: sync (qwen3) (#13573) b5401 Olivier Chafik 2025-05-15 23:29:10 +01:00
  • c6a2c9e741 gguf : use ggml log system (#13571) b5400 Diego Devesa 2025-05-15 10:13:11 -07:00
  • 07ad2b6db3 gguf-py : fix disconnect-before-connect in editor-gui (#13569) Daniel Tang 2025-05-15 12:47:10 -04:00
  • c531edfa34 convert : fix conversion for llama 4 (#13567) Xuan-Son Nguyen 2025-05-15 17:40:07 +02:00
  • 02cdd2d8b0 sycl: simplify bin_bcast_kernel (#13383) Atharva Dubey 2025-05-15 16:39:52 +01:00
  • 64bb51cf90 sycl: reordered Q4_K MMVQ (#13109) Svetlozar Georgiev 2025-05-15 16:35:44 +01:00
  • 9c404ed54c sycl: use oneDNN for matrices multiplication (#12972) b5395 Łukasz Ślusarczyk 2025-05-15 16:53:41 +02:00
  • 6c8b91500e llama-bench : fix -ot with dl backends (#13563) b5394 Diego Devesa 2025-05-15 06:46:55 -07:00
  • 3cc1f1f1d2 webui : handle PDF input (as text or image) + convert pasted long content to file (#13562) Xuan-Son Nguyen 2025-05-15 14:24:50 +02:00
  • c753d7bed0 server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540) b5392 Piotr Wilkin (ilintar) 2025-05-15 08:40:58 +02:00
  • b2838049cc bench : handle decode errors (#13548) b5391 Georgi Gerganov 2025-05-15 05:57:02 +03:00
  • aa48e373f2 server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802) b5390 Olivier Chafik 2025-05-15 02:39:51 +01:00
  • e3a9421b78 kv-cache : fix out-of-bounds view during reserve graph (#13547) Georgi Gerganov 2025-05-14 23:15:15 +03:00
  • 5ab5d5fb25 arm64: optimize q6_k_q8_k kernel with i8mm (#13519) b5388 Yibo Cai 2025-05-15 03:53:52 +08:00
  • 8282d74692 bench : handle decode errors gg/bench-handle-decode-errors Georgi Gerganov 2025-05-14 22:36:29 +03:00
  • 3198405e98 common: add partial regex support (#12808) b5387 Olivier Chafik 2025-05-14 19:50:57 +01:00
  • f5170c1d7a editorconfig : fix trailing whitespace from #13542 (#13546) Sigbjørn Skjæret 2025-05-14 20:22:49 +02:00
  • 017f10b5fa fix: crash when calling llama_state_get_size on a context without a KV cache (#13542) b5385 Gilad S. 2025-05-14 19:18:18 +03:00
  • 4696d56749 CUDA: fix crash on large batch size for quant. MoE (#13537) b5384 Johannes Gäßler 2025-05-14 16:41:02 +02:00
  • b7d2672082 llama : fix quantize with dl backends (#13539) Diego Devesa 2025-05-14 07:12:36 -07:00
  • 6da34fa276 CUDA: faster Deepseek FA, add Turing support (#13435) b5382 Johannes Gäßler 2025-05-14 16:08:20 +02:00
  • 5e7d95e22e fix: Move build_inp_pos to the top of the graph section for build_granite (#13538) b5381 Gabe Goodhart 2025-05-14 06:53:59 -06:00
  • 053174436f server : passthrough the /models endpoint during loading (#13535) b5380 Georgi Gerganov 2025-05-14 15:42:10 +03:00
  • 237acc7cd5 server : update readme + return json for "meta" field gg/server-models-loading Georgi Gerganov 2025-05-14 15:30:12 +03:00
  • 360a9c98e1 server : fix cache_tokens bug with no cache_prompt (#13533) b5379 Xuan-Son Nguyen 2025-05-14 13:35:07 +02:00
  • 6190e1c1c9 server : passthrough the /models endpoint during loading Georgi Gerganov 2025-05-14 14:15:42 +03:00
  • 09d13d94fb cmake: simplify vulkan shader test logic (#13263) b5378 bandoti 2025-05-14 07:53:57 -03:00
  • 24e86cae72 vulkan: KHR_coopmat flash attention (#13506) b5377 Jeff Bolz 2025-05-14 18:55:26 +09:00
  • bb1681fbd5 webui : use fflate for more deterministic gzip compress (#13525) Xuan-Son Nguyen 2025-05-14 10:26:12 +02:00
  • d486dd3e8e webui: Allow pasting file from clipboard (#13526) Luca Stefani 2025-05-14 10:07:31 +02:00
  • 21ca987fba docs: Update link to ggml-org in multimodal.md (#13513) ddpasa 2025-05-14 09:59:12 +02:00
  • be1d4a13db scripts : fix compare-llama-bench.py show parameter (#13514) Sigbjørn Skjæret 2025-05-14 08:41:01 +02:00
  • ab3971f2a0 vulkan: workaround FA compile failures on macos (#13517) b5372 Jeff Bolz 2025-05-14 13:15:50 +09:00
  • e5c834f718 quantize : improve tensor-type pattern matching (#13033) b5371 Ed Addario 2025-05-13 18:12:31 +01:00
  • 71bdbdb587 clip : clip.h become private API (⚠️ breaking change) (#13510) b5370 Xuan-Son Nguyen 2025-05-13 17:07:21 +02:00
  • f0995d28ce metal : use FA-vec kernel up to batch size 20 (#13496) b5369 Georgi Gerganov 2025-05-13 18:04:39 +03:00
  • c252e0c409 metal : optimize multi-sequence FA vec kernel (#13493) b5368 Georgi Gerganov 2025-05-13 18:04:00 +03:00
  • 4f711afed5 ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509) b5367 Dan Johansson 2025-05-13 17:02:28 +02:00
  • b89d605a91 batched-bench : fix pp batch contents (#13492) b5366 Georgi Gerganov 2025-05-13 18:01:53 +03:00
  • b4726345ac mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460) b5365 Xuan-Son Nguyen 2025-05-13 15:33:58 +02:00
  • bf79371120 scripts : support arbitrary input file formats in compare-llama-bench.py (#13455) Sigbjørn Skjæret 2025-05-13 15:31:12 +02:00
  • d590cd4c24 model : Granite MoE shared (#13269) b5363 Gabe Goodhart 2025-05-13 07:12:01 -06:00
  • 1e2809bc4b sync : ggml Georgi Gerganov 2025-05-13 14:01:45 +03:00
  • 78d70223c3 metal : use FA-vec kernel up to batch size 20 gg/metal-fa-vec-bs20 Georgi Gerganov 2025-05-13 08:52:34 +03:00
  • fdfc7de7fc metal : optimize multi-sequence FA vec kernel Georgi Gerganov 2025-05-13 08:03:27 +03:00
  • f078c79865 batched-bench : fix pp batch contents Georgi Gerganov 2025-05-13 07:55:30 +03:00
  • cf0a43bb64 llama-bench : add defrag-thold, check for invalid ranges (#13487) b5361 Diego Devesa 2025-05-12 15:31:37 -07:00
  • f0d46ef157 opencl: remove unnecessary assert for add (#13257) b5360 lhez 2025-05-12 13:13:49 -07:00
  • de4c07f937 clip : cap max image size 1024 for qwen vl model (#13478) b5359 Xuan-Son Nguyen 2025-05-12 15:06:51 +02:00
  • 10d2af0eaa llama/ggml: add LLM training support (#10544) b5358 Johannes Gäßler 2025-05-12 14:44:49 +02:00
  • 064cc596ac context : fix state io for memory-less contexts (#13470) b5357 Georgi Gerganov 2025-05-12 15:12:27 +03:00
  • 91159ee9df server : allow content to be null in oaicompat_completion_params_parse (#13477) b5356 Anudit Nagar 2025-05-12 18:56:42 +07:00
  • 22cdab343b llama-bench : accept ranges for integer parameters (#13410) b5355 Diego Devesa 2025-05-12 13:08:22 +02:00
  • a71a4075cd ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053) b5354 Dan Johansson 2025-05-12 13:06:19 +02:00
  • 95e18884fc CUDA: fix misaligned synchronization in FA (#13469) b5353 Johannes Gäßler 2025-05-12 10:51:21 +02:00
  • df8491922f ggml : add mrope kernel for metal (#13457) b5352 Xuan-Son Nguyen 2025-05-12 10:29:13 +02:00
  • 14492144c2 enable dpcpp nightly builds with libraries (#13406) b5351 Atharva Dubey 2025-05-12 06:15:32 +01:00
  • c104023994 mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) b5350 City 2025-05-12 00:39:06 +02:00
  • 9a390c4829 tools : fix uninitialized llama_batch in server (#13436) b5349 Anthony Umfer 2025-05-11 11:08:26 -04:00
  • 09232370fc scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) Sigbjørn Skjæret 2025-05-11 16:20:39 +02:00
  • 7474e00b34 CUDA: fix crash with partial offloading of MoE (#13439) b5347 Johannes Gäßler 2025-05-11 16:09:33 +02:00
  • 7f323a589f Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386) b5346 David Huang 2025-05-11 20:18:39 +08:00
  • 3eac209319 mtmd : support InternVL 3 38B and 78B mmproj (#13443) b5345 City 2025-05-11 11:35:52 +02:00
  • a634d75d1b mtmd : move helpers to dedicated file (#13442) b5344 Xuan-Son Nguyen 2025-05-11 11:34:23 +02:00
  • 62d4250e52 docs : Fix typo in InternVL3 model name (#13440) Thomas Germer 2025-05-10 22:26:46 +02:00
  • 0208355f42 CUDA: fix race conditions FlashAttention kernels (#13438) b5342 Johannes Gäßler 2025-05-10 22:22:48 +02:00
  • d2a4ef05c6 vocab : add ByteDance-Seed/Seed-Coder (#13423) b5341 Sigbjørn Skjæret 2025-05-10 22:08:07 +02:00
  • 15e6125a39 mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) b5340 Xuan-Son Nguyen 2025-05-10 19:57:54 +02:00
  • 3b24d26c22 server : update docs (#13432) Xuan-Son Nguyen 2025-05-10 18:44:49 +02:00
  • 43dfd741a5 llguidance : set tokenizer slices to default (#13424) b5338 Sigbjørn Skjæret 2025-05-10 17:19:52 +02:00
  • b064a51a4e ci: free_disk_space flag enabled for intel variant (#13426) Thammachart Chinvarapon 2025-05-10 21:34:48 +07:00
  • 053367d149 mtmd : support InternVL 2.5 and 3 (#13422) b5336 Xuan-Son Nguyen 2025-05-10 16:26:42 +02:00
  • d8919424f1 CUDA: fix FlashAttention on Turing (#13415) b5335 Johannes Gäßler 2025-05-10 09:16:52 +02:00
  • 7fef11766c arg : add env var to control mmproj (#13416) b5334 Xuan-Son Nguyen 2025-05-10 08:16:29 +02:00
  • dc1d2adfc0 vulkan: scalar flash attention implementation (#13324) b5333 Jeff Bolz 2025-05-09 23:07:07 -07:00
  • 7c28a74e07 chore(llguidance): use tagged version that does not break the build (#13413) b5332 Helton Reis 2025-05-09 17:15:39 -03:00
  • 33eff40240 server : vision support via libmtmd (#12898) b5331 Xuan-Son Nguyen 2025-05-09 19:29:37 +02:00
  • 17512a94d6 sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) b5330 Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
  • 611aa914ef metal : optimize MoE for large batches (#13388) b5329 Georgi Gerganov 2025-05-09 15:14:56 +03:00
  • 0cf6725e9f CUDA: FA support for Deepseek (Ampere or newer) (#13306) b5328 Johannes Gäßler 2025-05-09 13:34:58 +02:00
  • 27ebfcacba llama : do not crash if there is no CPU backend (#13395) b5327 Diego Devesa 2025-05-09 13:02:07 +02:00
  • 5c86c9ed3e CUDA: fix crash on large batch size for MoE models (#13384) b5326 Johannes Gäßler 2025-05-09 12:14:04 +02:00
  • efb8b47eda imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389) b5325 Bartowski 2025-05-09 05:53:58 -04:00
  • 0527771dd8 llama-run: add support for downloading models from ModelScope (#13370) b5324 R0CKSTAR 2025-05-09 17:25:50 +08:00
  • 2189fd3b63 mtmd : fix batch_view for m-rope (#13397) b5323 Xuan-Son Nguyen 2025-05-09 11:18:02 +02:00
  • 3f96aeff39 llama : one-off chat template fix for Mistral-Small-2503 (#13398) b5322 Xuan-Son Nguyen 2025-05-09 11:17:51 +02:00
  • b486ba05bf rpc : add rpc_msg_set_tensor_hash_req (#13353) b5321 Radoslav Gerganov 2025-05-09 10:31:07 +03:00
  • 02115dcd9a vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) b5320 Jeff Bolz 2025-05-09 02:23:41 -05:00
  • d9c4accaff server : (webui) rename has_multimodal --> modalities (#13393) Xuan-Son Nguyen 2025-05-09 09:06:37 +02:00
  • 15e03282bb ci : limit write permission to only the release step + fixes (#13392) b5318 Diego Devesa 2025-05-08 23:45:22 +02:00
  • f05a6d71a0 mtmd : Expose helper_decode_image_chunk (#13366) b5317 Matt Clayton 2025-05-08 14:25:39 -04:00
  • ee01d71e58 server : (webui) fix a very small misalignment (#13387) Xuan-Son Nguyen 2025-05-08 18:51:45 +02:00
  • 8c83449cb7 server : (webui) revamp the input area, plus many small UI improvements (#13365) b5315 Xuan-Son Nguyen 2025-05-08 15:37:29 +02:00
  • 1a844be132 convert : support rope_scaling type and rope_type (#13349) Sigbjørn Skjæret 2025-05-08 15:34:29 +02:00