Commit Graph

  • 5ade3000bd ggml: fix ggml_conv_1d_dw bug (ggml/1323) Jason Ni 2025-08-14 19:17:51 +08:00
  • 8b2483730f tests : remove unused includes (ggml/0) Georgi Gerganov 2025-08-14 13:41:03 +03:00
  • 810b9fc8b9 perplexity : provide a helpful hint for has_cpl case in split_equal error. (#15304) kallewoof 2025-08-14 20:03:30 +09:00
  • 4ebd0c125b cuda : fix GGML_CUDA_GRAPHS=OFF (#15300) Sigbjørn Skjæret 2025-08-14 12:22:07 +02:00
  • 5cdb27e091 finetune: SGD optimizer, more CLI args (#13873) Jonathan Graehl 2025-08-14 03:03:57 -07:00
  • 3ea913f1ce perplexity: give more information about constraints on failure (#15303) b6153 kallewoof 2025-08-14 15:16:32 +09:00
  • 29c8fbe4e0 HIP: bump requirement to rocm 6.1 (#15296) b6152 uvos 2025-08-13 20:44:30 +02:00
  • 1adc9812bd fix(nix): remove non-functional llama-cpp cachix cache from flake.nix (#15295) Bas Nijholt 2025-08-13 11:21:31 -07:00
  • b3e16665e1 server : enable -td and -tbd parameters (#15172) b6150 Sigbjørn Skjæret 2025-08-13 15:43:00 +02:00
  • c24f4e2688 ggml : update ggml_rope_multi (#12665) b6149 Judd 2025-08-13 18:45:15 +08:00
  • d8914fc47e common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191) b6148 Copilot 2025-08-13 12:44:40 +02:00
  • e885445bc1 server : filter out harmony thought messages (#15278) Aldehir Rojas 2025-08-13 05:28:21 -05:00
  • 648ebcdb73 ci : Added CI with RISC-V RVV1.0 Hardware (#14439) Ali Tariq 2025-08-13 15:14:44 +05:00
  • 07aa869a91 ci : add more python requirements to copilot-setup-steps (#15289) Sigbjørn Skjæret 2025-08-13 11:30:45 +02:00
  • 00f35d509e ggml : repack block_iq4_nlx8 (#14904) b6144 Georgi Gerganov 2025-08-13 11:09:39 +03:00
  • 6028bf7435 CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (#15132) b6143 Oliver Simons 2025-08-13 10:04:46 +02:00
  • bc5182272c ci : add copilot-setup-steps.yml (#15214) Sigbjørn Skjæret 2025-08-13 09:07:13 +02:00
  • e71d48e326 ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (#15188) b6141 Tak-RS 2025-08-13 14:54:30 +09:00
  • b0493156fa HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (#15273) b6140 uvos 2025-08-12 22:15:12 +02:00
  • f4586ee598 sycl: Fix and disable more configurations of mul_mat (#15151) b6139 Romain Biessy 2025-08-12 13:58:22 +02:00
  • 60a7658810 opencl: allow mixed f16/f32 add (#15140) b6138 rmatif 2025-08-12 11:42:41 +02:00
  • efe3a90996 CUDA cmake: add -lineinfo for easier debug (#15260) b6137 Aman Gupta 2025-08-12 17:21:45 +08:00
  • bbd57b7eaf CANN: GGML_OP_CPY optimization (#15070) b6136 Chenguang Li 2025-08-12 16:12:13 +08:00
  • d9b625edb6 ggml-quants : handle imatrix for MXFP4 compilade/imatrix-mxfp4 Francis Couture-Harpin 2025-08-11 22:02:53 -04:00
  • 25ff6f7659 musa: fix failures in test-backend-ops for mul_mat_id op (#15236) b6135 R0CKSTAR 2025-08-12 10:02:51 +08:00
  • be48528b06 CANN: Add broadcast for softmax and FA (#15208) b6134 hipudding 2025-08-11 22:50:31 +08:00
  • cf9e5648a7 mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750) b6133 rainred 2025-08-11 22:12:12 +08:00
  • fba5c0d680 chat : hotfix gpt-oss jinja raising an exception (#15243) b6132 Xuan-Son Nguyen 2025-08-11 15:31:35 +02:00
  • 53d0a12658 server : allow specifying reasoning_format in HTTP request (#15238) b6131 Xuan-Son Nguyen 2025-08-11 14:48:41 +02:00
  • 27093afe78 readme : update infra list (#15234) Zagaj 2025-08-11 14:27:54 +02:00
  • 228f724d9c kv-cache : fix seq_rm with seq_id == -1 (#15226) b6129 Georgi Gerganov 2025-08-11 13:58:24 +03:00
  • cd3069dfcb kv-cache : log (debug) all streams in find_slot (#15176) b6128 Daniel Bevenius 2025-08-11 11:21:19 +02:00
  • 50e81bdf5d convert : fix merge conflicts (#15229) Sigbjørn Skjæret 2025-08-11 11:15:44 +02:00
  • 1ebbaddff2 perplexity : update comments/error msg to use decode [no ci] (#15227) Daniel Bevenius 2025-08-11 10:21:24 +02:00
  • a3a7874272 convert : improve Mistral models integration (#14737) Julien Denize 2025-08-11 10:07:49 +02:00
  • 002cb1bb33 kleidiai: fix unsigned overflow bug (#15150) b6124 Charles Xu 2025-08-11 09:59:26 +02:00
  • 79c1160b07 cuda: refactored ssm_scan and use CUB (#13291) b6123 David Zhao 2025-08-09 13:29:43 -05:00
  • 34c9d765bf CUDA: add attention sinks for tile and wmma (#15178) b6122 Aman Gupta 2025-08-09 20:00:24 +08:00
  • e54d41befc gguf-py : add Numpy MXFP4 de/quantization support (#15111) b6121 compilade 2025-08-08 17:48:26 -04:00
  • 4850b52aed server-bench: external OAI servers, sqlite (#15179) Johannes Gäßler 2025-08-08 23:04:36 +02:00
  • cd6983d56d ggml : fix field name when new ggml_backend (#14944) b6119 AN Long 2025-08-08 21:37:22 +09:00
  • 6c7e9a5440 vendor: sync minja (#15161) b6118 Olivier Chafik 2025-08-08 10:45:18 +01:00
  • 1425f587a8 CUDA: attention sinks for mma FlashAttention (#15157) b6117 Johannes Gäßler 2025-08-08 08:19:58 +02:00
  • aaa3d07ae7 opencl: support sink in soft_max (attn sinks) (#15152) b6116 lhez 2025-08-08 13:47:03 +09:00
  • 50aa938901 convert : support non-mxfp4 HF model (#15153) b6115 Xuan-Son Nguyen 2025-08-07 23:26:03 +02:00
  • c4f53563df vulkan: support fattn sinks (#15126) b6114 Jeff Bolz 2025-08-07 15:44:20 -05:00
  • a0552c8bee vulkan: Add env var to disable host visible vidmem (#15109) b6113 Jeff Bolz 2025-08-07 15:07:11 -05:00
  • 99acbc9921 llama : Support intern-s1 (#14875) RunningLeon 2025-08-08 00:20:40 +08:00
  • 7ad67ba9fe HIP: add cmake option to enable compiler output of kernel resource usage metrics (#15103) b6111 uvos 2025-08-07 16:44:14 +02:00
  • 9a96389544 ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) Christian Kastner 2025-08-07 13:45:41 +02:00
  • 1d72c84188 CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131) b6109 Johannes Gäßler 2025-08-07 10:53:21 +02:00
  • 20638e4f16 scripts: fix crash when --tool is not set (#15133) Johannes Gäßler 2025-08-07 08:50:30 +02:00
  • 36d3f00e14 requirements : fix PyTorch uint64 compatibility (#15134) Daniel Bevenius 2025-08-07 05:31:48 +02:00
  • 5fd160bbd9 ggml: Add basic SET_ROWS support in WebGPU (#15137) b6106 Reese Levine 2025-08-06 15:14:40 -07:00
  • 756cfea826 fix profiling crash (#15072) b6105 rmatif 2025-08-06 23:17:51 +02:00
  • 2763dc8b53 ggml-quants : handle zero amax for MXFP4 compilade/gguf-py-mxfp4 Francis Couture-Harpin 2025-08-06 16:26:25 -04:00
  • e725a1a982 opencl: add swiglu_oai and add_id (#15121) b6104 lhez 2025-08-07 04:12:17 +09:00
  • 3db4da56a5 chat : support Granite model reasoning and tool call (#14864) b6103 Sachin Desai 2025-08-06 11:27:30 -07:00
  • 476aa3fd57 Fixed name -override-tensors to -override-tensor (#15129) b6102 Juk Armstrong 2025-08-06 17:28:48 +01:00
  • 0d8831543c ggml : fix fallback to CPU for ununsupported ops (#15118) b6101 Diego Devesa 2025-08-06 05:37:35 -07:00
  • 65c797c4fa chat : fix yandex chat template (#15116) b6100 Sigbjørn Skjæret 2025-08-06 13:26:49 +02:00
  • 25726898e8 chat : fix hunyuan auto-detection (#15114) b6099 stevenkuang 2025-08-06 17:48:30 +08:00
  • 2241453252 CANN: add support for ACL Graph (#15065) b6098 Chenguang Li 2025-08-06 14:12:42 +08:00
  • 141cab137d gguf-py : add MXFP4 de/quantization support Francis Couture-Harpin 2025-08-05 23:07:21 -04:00
  • 9515c6131a ggml: WebGPU disable SET_ROWS for now (#15078) b6097 Reese Levine 2025-08-05 16:26:38 -07:00
  • fd1234cb46 llama : add gpt-oss (#15091) b6096 Georgi Gerganov 2025-08-05 22:10:36 +03:00
  • f324a3b715 chat : only remove double bos/eos if added (#15086) b6095 Sigbjørn Skjæret 2025-08-05 20:43:36 +02:00
  • ea5e55d03e Merge branch 'master' into compilade/imatrix-neutral-prior Francis Couture-Harpin 2025-08-05 13:34:40 -04:00
  • 46a8601140 quantize : assume the neutral prior is equal imatrix weights Francis Couture-Harpin 2025-08-05 13:34:01 -04:00
  • be42642581 readme : update hot topics (#15097) Georgi Gerganov 2025-08-05 20:19:33 +03:00
  • 3306ceabf0 sycl: fix mul_mat selection (#15092) b6093 Romain Biessy 2025-08-05 18:39:55 +02:00
  • c81de6e107 Fix glm4moe bug (#15088) b6092 Juk Armstrong 2025-08-05 13:56:44 +01:00
  • 22f060c9c4 webui: fix markdown table (#15081) Alex Wu 2025-08-05 19:56:44 +08:00
  • ee3a9fcf88 context : fix index overflow on huge outputs (#15080) b6090 compilade 2025-08-05 05:27:45 -04:00
  • 2ec70c964b tests: Fix OPT_STEP_SGD test-backend-ops 0cc4m/vulkan-op-opt-step-sgd 0cc4m 2025-07-20 07:22:28 +00:00
  • 9d0312425e Vulkan: Implement GGML_OP_OPT_STEP_SGD 0cc4m 2025-07-20 07:18:30 +00:00
  • 145401c9e3 context : fix logits size overflow for huge batches compilade/fix-output-overflow Francis Couture-Harpin 2025-08-04 22:26:34 -04:00
  • f16a843a38 context : fix overflow when re-ordering huge outputs Francis Couture-Harpin 2025-08-04 22:01:28 -04:00
  • ec428b02c3 llama : add --n-cpu-moe option (#15077) b6089 Diego Devesa 2025-08-04 16:05:36 -07:00
  • 50e83eaed8 Merge branch 'master' into finelayer Jonathan Graehl 2025-08-04 15:44:10 -07:00
  • 19f68fa5a4 imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076) b6088 compilade 2025-08-04 17:26:52 -04:00
  • 41613437ff cmake: Add GGML_BACKEND_DIR option (#15074) b6087 Christian Kastner 2025-08-04 21:29:14 +02:00
  • 342e7014db imatrix : only warn about suffix when output format is unspecified compilade/imatrix-gguf-warning Francis Couture-Harpin 2025-08-03 18:12:06 -04:00
  • afa43e13c8 imatrix : add warning when suffix is not .gguf for GGUF imatrix Francis Couture-Harpin 2025-08-03 18:03:53 -04:00
  • e5bebe5251 gguf-py : add --chat-template-file to gguf_new_metadata (#15075) Sigbjørn Skjæret 2025-08-04 21:01:48 +02:00
  • ef0144c087 model: support GLM 4.5 family of models (#14939) b6085 Sam 2025-08-05 04:29:25 +10:00
  • 2721257e3e quantize : fix confusing error message if ftype is invalid (#15071) b6084 Sigbjørn Skjæret 2025-08-04 18:11:02 +02:00
  • 587d0118f5 ggml: WebGPU backend host improvements and style fixing (#14978) b6083 Reese Levine 2025-08-04 08:52:43 -07:00
  • 92383bfab3 quantize : store metadata for prior weight used for imatrix Francis Couture-Harpin 2025-08-04 01:47:00 -04:00
  • 5aa1105da2 vulkan: fix build when using glslang that does not support coopmat2 (#15062) b6082 Jeff Bolz 2025-08-04 00:09:19 -05:00
  • 0416ed2bb8 quantize : configurable neutral imatrix prior Francis Couture-Harpin 2025-08-03 16:28:37 -04:00
  • d31192b4ee imatrix : use GGUF by default (#14842) b6081 compilade 2025-08-03 16:00:05 -04:00
  • 0a2f5496be imatrix : fix 3d activation handling for hybrid and recurrent models (#14994) b6080 compilade 2025-08-03 15:49:13 -04:00
  • 11a3811164 memory : handle kv_unified for hybrid models (#15050) b6079 compilade 2025-08-03 15:43:07 -04:00
  • 97366dc6ab vocab : JetBrains Mellum pre-tokenizer (#15045) b6078 Csaba Kecskemeti 2025-08-03 12:38:18 -07:00
  • 83bc2f288c model : add text-only support for Kimi-VL (and find special tokens in text_config) (#15051) Gabriel Larson 2025-08-03 09:56:25 -05:00
  • 6c7a441161 vulkan: Use coopmat2 for conv2d (#14982) b6076 Jeff Bolz 2025-08-03 07:23:57 -05:00
  • e549515cb3 memory : handle kv_unified for hybrid models compilade/hybrid-kv_unified Francis Couture-Harpin 2025-08-03 00:45:47 -04:00
  • 5c0eb5ef54 opencl: fix adreno compiler detection logic (#15029) b6075 lhez 2025-08-02 10:51:18 -07:00
  • 03d4698218 CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035) b6074 Johannes Gäßler 2025-08-02 16:37:08 +02:00