Commit Graph

  • abb9f3c42b vulkan: fix MMQ shader push constants and multi-dispatch (#19732) b8109 Ruben Ortlam 2026-02-19 14:59:16 +01:00
  • 69788e0d23 ggml : fix int overflows in ggml_new_object() Georgi Gerganov 2026-02-19 15:59:09 +02:00
  • 198f79d6c3 gguf : prevent integer overflow for ggml_context mem size Georgi Gerganov 2026-02-19 15:51:00 +02:00
  • da348c9dfb models : fix qwen3.5 beta/gate shapes (#19730) b8108 Georgi Gerganov 2026-02-19 15:19:53 +02:00
  • e6267a9359 mtmd: build_attn modified, flash_attn on/off via ctx_params (#19729) b8107 Saba Fallah 2026-02-19 13:50:29 +01:00
  • 2bf318fd2f model : add JAIS-2 architecture support (#19488) b8106 3 a l i 2026-02-19 16:30:17 +04:00
  • c78e682245 CUDA: fix kernel selection logic for tile FA (#19686) b8105 Johannes Gäßler 2026-02-19 12:42:58 +01:00
  • c5897995a7 mtmd : chat : Fix extra \n between text and media marker (#19595) b8104 Tarek Dakhran 2026-02-19 12:18:57 +01:00
  • 03fd9d3bb4 webui: Fix Attachments not being included in completion request (#19731) Aleksander Grygier 2026-02-19 10:27:38 +01:00
  • 8004f3a8d1 model : add tokenizer from LFM2.5-Audio-1.5B (#19687) b8102 Tarek Dakhran 2026-02-19 09:54:48 +01:00
  • eacb4b67a2 llama : use output_resolve_row() in get_logits_ith/get_embeddings_ith (#19663) b8101 Daniel Bevenius 2026-02-19 09:48:08 +01:00
  • c0d0430340 model : full modern bert support (#18330) b8100 Ryan Mangeno 2026-02-19 02:52:21 -05:00
  • 3bb2fcc856 llamafile: powerpc: add FP16 MMA path for Q4/Q8 matmul (#19709) b8099 shalinib-ibm 2026-02-19 11:58:53 +05:30
  • 27326bfce1 models : dedup qwen35 graphs (#19660) b8098 Georgi Gerganov 2026-02-19 08:17:49 +02:00
  • ad9f692f8f models : dedup Kimi Linear delta net implementation (#19668) ymcki 2026-02-19 14:15:17 +08:00
  • 8a70973557 Add Jinja support for "indent" string filter (#19529) b8096 Piotr Wilkin (ilintar) 2026-02-19 00:25:52 +01:00
  • e7f2f95c9a ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (#19535) b8095 Reese Levine 2026-02-18 16:06:29 -07:00
  • b55dcdef5d server: save generated text for the /slots endpoint (for LLAMA_SERVER_SLOTS_DEBUG=1) (#19622) b8094 matteo 2026-02-18 18:53:37 +01:00
  • eeef3cfced model: support GLM-OCR (#19677) b8093 Xuan-Son Nguyen 2026-02-18 17:51:40 +01:00
  • e99f1083a0 docs: Fix broken links for preparing models in Backends (#19684) Maciej Lisowski 2026-02-18 16:50:23 +01:00
  • 238856ec8f ggml webgpu: shader library organization (#19530) b8091 Reese Levine 2026-02-18 07:51:02 -07:00
  • ea003229d3 Pre-MCP UI and architecture cleanup (#19689) Aleksander Grygier 2026-02-18 12:02:02 +01:00
  • d0061be838 vulkan: split mul_mat into multiple dispatches to avoid overflow (#19509) b8089 Jeff Bolz 2026-02-18 01:47:10 -08:00
  • 5d45884106 metal : fix build JohannesGaessler/ggml-meta-backend-8-tmp Georgi Gerganov 2026-02-18 09:14:31 +02:00
  • a569bda445 common : make small string helpers as inline functions (#19693) b8088 Adrien Gallouët 2026-02-18 08:03:01 +01:00
  • e2f19b320f opencl: refactor expm1 and softplus (#19404) b8087 shaofeiqi 2026-02-17 14:47:18 -08:00
  • 983559d24b opencl: optimize mean and sum_row kernels (#19614) b8086 shaofeiqi 2026-02-17 13:56:09 -08:00
  • 2b089c7758 model-conversion : add option to print tensor values (#19692) Daniel Bevenius 2026-02-17 20:43:22 +01:00
  • afa6bfe4f7 Pre-MCP UI and architecture cleanup (#19685) Aleksander Grygier 2026-02-17 13:47:45 +01:00
  • ae2d3f28a8 ggml: ggml-cpu: force-no-lto-for-cpu-feats (#19609) b8083 Talha Can Havadar 2026-02-17 12:22:46 +01:00
  • ad8207af77 cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645) b8082 Georgi Gerganov 2026-02-17 12:31:49 +02:00
  • 667b694278 model-conversion : make printing of config values optional (#19681) Daniel Bevenius 2026-02-17 10:46:53 +01:00
  • e48349a49d ci : bump komac version (#19682) Sigbjørn Skjæret 2026-02-17 09:30:31 +01:00
  • ae46a61e41 build : link ws2_32 as PUBLIC on Windows (#19666) b8079 Adrien Gallouët 2026-02-17 08:37:07 +01:00
  • 65cede7c70 build : cleanup library linking logic (#19665) b8078 Adrien Gallouët 2026-02-17 08:36:45 +01:00
  • 05fa625eac convert : add JoyAI-LLM-Flash (#19651) b8077 DAN™ 2026-02-16 16:49:57 -05:00
  • d612901116 perplexity: add proper batching (#19661) b8076 AesSedai 2026-02-16 08:44:44 -08:00
  • cceb1b4e33 common : inline functions (#18639) b8075 Ivan Chikish 2026-02-16 18:52:24 +03:00
  • d23a55997d ggml : make ggml_is_view as API (#19539) b8074 Judd 2026-02-16 23:43:34 +08:00
  • 5f28c53d11 model: Add support for Tiny Aya Models (#19611) b8073 Saurabh Dash 2026-02-16 10:28:46 -05:00
  • 4408494144 build : rework llama_option_depr to handle LLAMA_CURL (#19658) b8072 Adrien Gallouët 2026-02-16 16:06:48 +01:00
  • f0198ef6fc Merge pull request #6 from gaugarg-nv/get_host_buffer_type Johannes Gäßler 2026-02-16 15:11:08 +01:00
  • 2ba9adc093 Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (#19591) b8071 Mario Limonciello 2026-02-16 07:46:08 -06:00
  • cc45f2ada6 models : deduplicate delta-net graphs for Qwen family (#19597) b8070 Georgi Gerganov 2026-02-16 14:35:04 +02:00
  • aa8b62105c Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Gaurav Garg 2026-02-16 15:39:26 +05:30
  • d5dfc33027 graph : fix KQ mask, lora, cvec reuse checks (#19644) b8069 Georgi Gerganov 2026-02-16 09:21:11 +02:00
  • 267ba5a1d9 ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132) b8068 abhijain1204fujitsu 2026-02-16 12:08:43 +05:30
  • ff4affb4c1 sync : ggml b8067 Georgi Gerganov 2026-02-15 22:23:13 +02:00
  • 55d58599c8 ggml : bump version to 0.9.7 (ggml/1425) Georgi Gerganov 2026-02-15 22:21:04 +02:00
  • 1a8c700bfd ggml : bump version to 0.9.6 (ggml/1423) Georgi Gerganov 2026-02-07 09:58:02 +02:00
  • 27b93cbd15 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624) b8064 David Friehs 2026-02-15 18:08:42 +01:00
  • 6e67fd2144 docs: update s390x build docs (#19643) Aaron Teo 2026-02-16 00:33:34 +08:00
  • 9e118b97c4 build : remove LLAMA_HTTPLIB option (#19623) b8062 Adrien Gallouët 2026-02-15 15:38:50 +01:00
  • 57088276d4 cmake : check if KleidiAI API has been fetched (#19640) b8061 Daniel Bevenius 2026-02-15 13:59:38 +01:00
  • 341bc7d23c context : fix output reorder with backend sampling (#19638) b8060 Georgi Gerganov 2026-02-15 14:57:40 +02:00
  • 08e6d914b8 ggml : avoid UB in gemm ukernel (#19642) b8059 Georgi Gerganov 2026-02-15 14:56:35 +02:00
  • 184c694f45 ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (#19399) b8058 Aaron Teo 2026-02-15 18:20:35 +08:00
  • 684b36101c ggml-cpu: FA add GEMM microkernel (#19422) b8057 Aman Gupta 2026-02-15 11:09:24 +05:30
  • 3a00c98584 cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (#19581) b8056 SamareshSingh 2026-02-14 23:22:53 -06:00
  • 079feab9e3 convert : ensure all models handle new experts count (#19621) b8055 Sigbjørn Skjæret 2026-02-14 22:22:32 +01:00
  • 01d8eaa28d mtmd : Add Nemotron Nano 12B v2 VL support (#19547) b8054 Anav Prasad 2026-02-14 05:07:00 -08:00
  • 1725e316c1 models : optimize qwen3next graph (#19375) b8053 Georgi Gerganov 2026-02-14 12:57:36 +02:00
  • b7742cf321 ggml : fix GGML_DEBUG with OpenMP (#19599) b8052 Adrien Gallouët 2026-02-14 11:22:57 +01:00
  • badba89320 NetBSD build support (#19589) b8051 iMil 2026-02-14 09:47:01 +01:00
  • baa12f3831 webui: Architecture and UI improvements (#19596) Aleksander Grygier 2026-02-14 09:06:41 +01:00
  • 2d8015e8a4 llama : update LoRA API. + fix excessive graph reserves (#19280) b8049 agent-enemy-2 2026-02-14 03:06:27 -05:00
  • eb145c0753 mmap: Fix Windows handle lifetime (#19598) b8048 George 2026-02-14 10:05:12 +02:00
  • 6e473fb384 metal : fix ACC op (#19427) b8047 Georgi Gerganov 2026-02-14 09:54:03 +02:00
  • c7db95f106 scripts : use official split.py for cpp-httplib (#19588) b8046 Adrien Gallouët 2026-02-14 08:41:16 +01:00
  • 0d00ef65ed convert : store ffn_gate_inp_shexp as F32 (#19606) Sigbjørn Skjæret 2026-02-14 08:17:43 +01:00
  • 91ea5d67f2 build : fix libtool call in build-xcframework.sh (#19605) Adrien Gallouët 2026-02-14 06:48:37 +01:00
  • dbb023336b vulkan: support L2_NORM with contiguous rows (#19604) b8043 Jeff Bolz 2026-02-13 21:42:04 -08:00
  • 53aef25a88 vulkan: support GGML_OP_SET (#19584) b8042 Jeff Bolz 2026-02-13 21:36:38 -08:00
  • 2dec548094 vulkan: Add vendor id for Qualcomm drivers (#19569) b8041 Sophon 2026-02-14 13:29:17 +08:00
  • 0ccbfdef3e hexagon: further optimizations and refactoring for flash attention (#19583) b8040 Max Krasnyansky 2026-02-13 16:27:30 -08:00
  • 94a602db66 github : add missing backends to issue templates (#19603) Mengsheng Wu 2026-02-13 15:56:53 -08:00
  • 05a6f0e894 vulkan: restore -inf check in FA shaders (#19582) b8038 Jeff Bolz 2026-02-13 11:35:29 -08:00
  • fd24533e89 better granularity estimate Johannes Gäßler 2026-02-13 18:20:44 +01:00
  • d8f97b99ed fix compilation Johannes Gäßler 2026-02-13 15:13:40 +01:00
  • b48e80f677 common : update download code (#19573) b8037 Adrien Gallouët 2026-02-13 15:10:46 +01:00
  • 752584d5f5 model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) (#19460) b8036 Xuan-Son Nguyen 2026-02-13 14:56:53 +01:00
  • 98ab6727e4 arbitrary num. of GPUs/tensor split Johannes Gäßler 2026-02-13 11:45:05 +01:00
  • cc2aa81513 Fix wrong memcpy length for block_interleave == 4 (#19575) b8035 Alberto Cabrera Pérez 2026-02-13 12:32:14 +00:00
  • 0e21991472 fix vulkan ggml_acc only works in 3d but not 4d (#19426) b8034 ymcki 2026-02-13 20:31:37 +08:00
  • b2ecc0cdb4 support --verbose-prompt (#19576) b8033 Sigbjørn Skjæret 2026-02-13 12:49:10 +01:00
  • 5065da554e CUDA: loop over ne2*ne3 in case it overflows (#19538) b8032 Aman Gupta 2026-02-13 17:01:40 +05:30
  • 5174d7206f webui: UI and routing fixes (#19586) Aleksander Grygier 2026-02-13 12:31:00 +01:00
  • 9c7d45c0fc fix view_offs scaling Johannes Gäßler 2026-02-13 11:05:57 +01:00
  • 43919b7f4f CUDA: Do not mutate cgraph for fused ADDs (#19566) b8030 Oliver Simons 2026-02-13 10:37:55 +01:00
  • 423cf0b26f docs : fix broken link and typo (#19560) Pavan Shinde 2026-02-13 14:08:09 +05:30
  • 33a56f90a6 model : Kimi Linear fix conv state update (#19531) b8028 ymcki 2026-02-13 16:10:18 +08:00
  • 25224c8021 llama : remove deprecated codecvt (#19565) b8027 Adrien Gallouët 2026-02-13 06:43:53 +01:00
  • 2f5d8f8edc vendor : update BoringSSL to 0.20260211.0 (#19562) b8026 Adrien Gallouët 2026-02-13 06:43:26 +01:00
  • bb96bfd361 memory : fix kv cache size for hybrid models (#19559) b8025 Georgi Gerganov 2026-02-13 07:36:24 +02:00
  • 0644baefde metal : improve concurrency (#19555) b8024 Georgi Gerganov 2026-02-13 07:35:57 +02:00
  • 490eb96b88 metal : support GGML_OP_SET (#19548) b8023 Georgi Gerganov 2026-02-13 07:34:52 +02:00
  • 31e4f189bb support for tensor dims % n_devs != 0 Johannes Gäßler 2026-02-11 23:34:43 +01:00
  • 3bb78133ab hexagon: fix typo in vtcm_needs_release (#19545) b8022 Shupei Fan 2026-02-13 07:07:49 +08:00
  • 79cc0f2daf opencl: add basic support for q4_1 (#19534) b8021 lhez 2026-02-12 14:52:37 -08:00
  • 338085c69e args : add -kvu to llama-parallel (#19577) b8020 Georgi Gerganov 2026-02-12 21:52:41 +02:00