Commit Graph

  • b55f06e1aa vulkan.Dockerfile: install vulkan SDK using tarball (#15282) R0CKSTAR 2025-08-23 14:58:57 +08:00
  • 0a9b43e507 vulkan : support ggml_mean (#15393) Acly 2025-08-23 08:35:21 +02:00
  • 330c3d2d21 vulkan: optimize mul_mat_id loading row ids into shared memory (#15427) b6251 Jeff Bolz 2025-08-23 01:31:54 -05:00
  • e92734d51b test-opt: allow slight inprecision (#15503) b6250 Johannes Gäßler 2025-08-22 23:47:01 +02:00
  • 45363632cb ggml WebGPU: add support for quantization types (#15440) b6249 Reese Levine 2025-08-22 11:28:03 -07:00
  • 32732f2459 model : gpt-oss add response_format support (#15494) b6248 Aldehir Rojas 2025-08-22 11:04:08 -05:00
  • 92f7f0a53c ggml: add conv3d op (#15182) b6247 rmatif 2025-08-22 15:33:15 +02:00
  • b1ab91821f cuda : add Pad Reflect 1D support (#14659) b6246 Yavor Ivanov 2025-08-22 14:06:29 +03:00
  • 9ebebef62f llama : remove KV cache defragmentation logic (#15473) b6245 Georgi Gerganov 2025-08-22 12:22:13 +03:00
  • ad5c975c2d ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486) b6244 Aaron Teo 2025-08-22 16:11:04 +08:00
  • 4afb0a746f server : Support multimodal completion and embeddings prompts in JSON format (#15108) b6243 65a 2025-08-22 08:10:14 +00:00
  • e288693669 readme : model : mtdm : lfm2 improvements (#15476) b6242 Tarek Dakhran 2025-08-22 09:29:08 +02:00
  • a0f98dd604 CANN: Optimize RMS_NORM using cache (#15419) b6241 Chenguang Li 2025-08-22 14:12:07 +08:00
  • 54a241f505 sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488) b6240 Diego Devesa 2025-08-21 14:09:32 -07:00
  • cd36b5e5c7 llama : remove deprecated llama_kv_self API (#15472) b6239 Georgi Gerganov 2025-08-21 19:13:45 +03:00
  • 3f196be84b graph : remove build_attn_with_sinks overload (#15469) b6238 Georgi Gerganov 2025-08-21 18:44:45 +03:00
  • 97ae5961a4 vulkan : support conv_2d_dw with f16 weights (#15392) b6237 Acly 2025-08-21 17:01:51 +02:00
  • 20c2dac8c6 vulkan: add exp operation (#15456) b6236 Dong Won Kim 2025-08-22 00:00:16 +09:00
  • 96452a3fa4 vulkan: Reuse conversion results in prealloc_y (#15410) b6235 Jeff Bolz 2025-08-21 09:55:00 -05:00
  • 9ad5e60dba examples : fix some typos in examples/model-conversion/README.md (#15477) Jie Fu (傅杰) 2025-08-21 22:53:13 +08:00
  • 715a6db02c kv-cache : drop the "unified" prefix (#15467) Georgi Gerganov 2025-08-21 17:00:33 +03:00
  • ad294df03f examples : install torch-cpu for model conversion tool/example (#15475) Jie Fu (傅杰) 2025-08-21 21:42:34 +08:00
  • 029bb39eb1 ci : enable RVV1.0 native build (#15386) Ali Tariq 2025-08-21 17:52:16 +05:00
  • 30649cab65 ci : continue file download with wget (#15471) Georgi Gerganov 2025-08-21 13:42:55 +03:00
  • 2758fa10da examples : add model conversion tool/example (#15455) b6229 Daniel Bevenius 2025-08-21 12:16:54 +02:00
  • b108e42904 ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221) b6228 Michael Giba 2025-08-21 05:06:46 -05:00
  • 245be739df ci : add copilot-instructions.md (#15286) Copilot 2025-08-21 11:47:52 +02:00
  • b2caf67db1 convert : make Mistral community chat templates optional via parameter (#15420) Julien Denize 2025-08-21 11:19:50 +02:00
  • 2f3dbffb17 common : fix incorrect print of non-ascii characters in the logging (#15466) b6225 Jie Fu (傅杰) 2025-08-21 16:54:34 +08:00
  • 945e1f12a6 ggml : fix condition of im2col on Metal backend (#15460) Xuan-Son Nguyen 2025-08-21 07:32:26 +02:00
  • 1b0db8f6e0 server : fix webui (#15462) stduhpf 2025-08-21 07:19:22 +02:00
  • 29f538ac63 examples : remove references to make in examples [no ci] (#15457) Daniel Bevenius 2025-08-21 06:12:28 +02:00
  • 8ad038c0fd musa: add GGML_UNUSED_VARS (#15446) R0CKSTAR 2025-08-21 11:06:05 +08:00
  • 5682a3745f sched : copy only the used experts when offloading prompt processing (#15346) Diego Devesa 2025-08-20 16:35:28 -07:00
  • 1bc664a26a server: fix OpenAI API compatibility for usage statistics in chat streams (#15444) teo 2025-08-21 07:10:08 +09:00
  • 13aeb7aef2 CUDA: refactor FA support/selection code (#15454) b6218 Johannes Gäßler 2025-08-20 23:14:14 +02:00
  • 7a6e91ad26 CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433) Johannes Gäßler 2025-08-20 16:58:49 +02:00
  • fec9519802 vulkan: shorten pipeline name strings (#15431) Jeff Bolz 2025-08-20 09:33:14 -05:00
  • 657b8a77bd chat: handle gpt-oss return/end token inconsistency (#15421) b6215 Daniel Bevenius 2025-08-20 14:26:01 +02:00
  • ec5ab1a36c common : fix context shift help message (#15448) b6214 Jie Fu (傅杰) 2025-08-20 18:33:30 +08:00
  • 1a99c2d948 cmake : fix target include directories (#15450) b6213 xiaobing318 2025-08-20 18:32:05 +08:00
  • 37f10f955f make : remove make in favor of CMake (#15449) Daniel Bevenius 2025-08-20 12:31:16 +02:00
  • 2f37014073 lookahead : add sample command to readme (#15447) Georgi Gerganov 2025-08-20 13:30:46 +03:00
  • a094f38143 musa: fix build warnings (#15258) b6210 R0CKSTAR 2025-08-20 10:17:37 +08:00
  • 899398277d convert : fix conversion from FP8 for Deepseek-V3.1-Base Francis Couture-Harpin 2025-08-19 17:27:59 -04:00
  • fb22dd07a6 opencl: mark argsort unsupported if cols exceed workgroup limit (#15375) b6209 lhez 2025-08-20 02:25:51 +08:00
  • 9ef6b0b835 model : add gpt-oss type strings (#15424) b6208 Georgi Gerganov 2025-08-19 19:58:28 +03:00
  • 1e19f5d462 common : Add top-nsigma sampler to help globally (#15428) b6207 Gian-Carlo Pascutto 2025-08-19 18:58:14 +02:00
  • d2fcd91cf9 server : disable context shift by default (#15416) Georgi Gerganov 2025-08-19 16:46:37 +03:00
  • a6d3cfe7fa CANN: optimize rope operator (#15335) b6205 SHUAI YANG 2025-08-19 21:28:22 +08:00
  • 67f09a3a27 musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413) b6204 R0CKSTAR 2025-08-19 18:33:47 +08:00
  • 6424594c56 ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385) Marvin Gießing 2025-08-19 10:54:31 +02:00
  • e9288e8869 chat : clarify the meaning of reasoning_format (#15408) b6202 Xuan-Son Nguyen 2025-08-19 10:29:36 +02:00
  • 9d262f4bad server : remove swa_full warning (#15399) b6201 Georgi Gerganov 2025-08-19 08:45:26 +03:00
  • f0d3c7405c batched-bench : use rand tokens (#15398) Georgi Gerganov 2025-08-19 08:45:12 +03:00
  • f08c4c0d8d mtmd : clean up clip_n_output_tokens (#15391) b6199 Xuan-Son Nguyen 2025-08-18 22:53:52 +02:00
  • 6d7f1117e3 codeowners : remove mmv.* Georgi Gerganov 2025-08-18 22:02:50 +03:00
  • 60212f1ead sync : ggml Georgi Gerganov 2025-08-18 22:02:11 +03:00
  • f0c541d315 scripts : update sync scripts Georgi Gerganov 2025-08-18 20:35:47 +03:00
  • baa9255a45 llama : merge conts and reshapes and remove unnecessary cont (#15380) b6195 Sigbjørn Skjæret 2025-08-18 19:30:17 +02:00
  • 3007baf201 readme : update hot topics (#15397) Georgi Gerganov 2025-08-18 18:11:44 +03:00
  • d1d8241600 server : fix incoming tasks not process in order (#15395) b6193 davidef 2025-08-18 16:51:42 +02:00
  • 618575c582 Fix broken build: require updated pip to support --break-system-packages (#15357) Dobri Danchev 2025-08-18 05:50:48 -05:00
  • f44f793172 ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (#15379) b6191 compilade 2025-08-18 03:23:56 -04:00
  • ae532eac2c vulkan: disable spirv-opt for bfloat16 shaders (#15352) b6190 Jeff Bolz 2025-08-18 00:56:29 -05:00
  • e5155e6986 server : export max observed n_past value (#15361) b6189 Oleksandr Kuvshynov 2025-08-17 18:28:58 -04:00
  • fb573f4440 ggml-quants : avoid division by zero in make_q3_quants compilade/fix-qp-iq1-problems Francis Couture-Harpin 2025-08-17 18:26:02 -04:00
  • 184cdc6b27 ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors Francis Couture-Harpin 2025-08-17 11:36:30 -04:00
  • 21c17b5bef vulkan: Use larger workgroups for mul_mat_vec when M is small (#15355) b6188 Jeff Bolz 2025-08-17 11:08:57 -05:00
  • 19f4decae0 vulkan: support sqrt (#15370) b6187 Dong Won Kim 2025-08-17 23:03:09 +09:00
  • 4d196981d4 convert : force patch_embd weights to F16 or F32 to avoid broken GGUFs (#15367) Sigbjørn Skjæret 2025-08-17 14:47:42 +02:00
  • b143fbc87a ci : fix hang in windows-hip build/release (#15365) b6185 Sigbjørn Skjæret 2025-08-17 13:30:23 +02:00
  • de5627910d vulkan: Optimize argsort (#15354) b6184 Jeff Bolz 2025-08-17 03:41:45 -05:00
  • 65349f26f2 model : support vision LiquidAI LFM2-VL family (#15347) b6183 Tarek Dakhran 2025-08-16 23:33:54 +02:00
  • 1fe00296f5 vulkan: fuse adds (#15252) b6182 Jeff Bolz 2025-08-16 11:48:22 -05:00
  • de2192794f vulkan: Support mul_mat_id with f32 accumulators (#15337) b6181 Jeff Bolz 2025-08-16 04:18:31 -05:00
  • 2e2b22ba66 vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334) b6180 Jeff Bolz 2025-08-16 03:58:38 -05:00
  • 912ff8c119 OpenCL: add initial FA support (#14987) b6179 rmatif 2025-08-16 10:05:55 +02:00
  • 5e6229a840 common : fix double bos, use common_chat_templates for add_bos and add_eos (#15326) b6178 Daniel Bevenius 2025-08-15 19:50:52 +02:00
  • e2c1bfff53 opencl: add initial mxfp4 support via mv (#15270) b6177 lhez 2025-08-16 00:52:14 +08:00
  • 5edf1592fd vulkan : fix out-of-bounds access in argmax kernel (#15342) b6176 Georgi Gerganov 2025-08-15 17:16:36 +03:00
  • db3010bd23 vulkan : fix compile warnings on macos (#15340) b6175 Georgi Gerganov 2025-08-15 16:28:28 +03:00
  • ff27f80a74 ggml: initial IBM zDNN backend (#14975) b6174 Aaron Teo 2025-08-15 21:11:22 +08:00
  • d3248d9b65 ci : fix ios-xcode-build (#15324) b6173 Sigbjørn Skjæret 2025-08-15 14:02:39 +02:00
  • 7aeee88cfe ci : move ccache action to ggml-org fork (#15328) Diego Devesa 2025-08-15 03:27:02 -07:00
  • b07791aa1d test-opt: fix backend support check (#15317) Johannes Gäßler 2025-08-15 11:23:17 +02:00
  • 4227c9be42 CUDA: fix negative KV_max values in FA (#15321) Johannes Gäßler 2025-08-14 23:21:24 +02:00
  • 1ae6ab7601 Merge branch 'master' into compilade/convert-prequant Francis Couture-Harpin 2025-08-14 17:05:21 -04:00
  • df36bce667 eval-callback : stop on first NaN (#15320) Georgi Gerganov 2025-08-14 22:10:51 +03:00
  • f75b830647 chat : include kwargs in template example (#15309) Diego Devesa 2025-08-14 10:28:29 -07:00
  • 7a0de96045 llama : add 18-layer model type for Gemma 3-270m (#15319) Daniel Bevenius 2025-08-14 17:56:26 +02:00
  • e4e915912c devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005) simevo 2025-08-14 17:45:27 +02:00
  • 5ba36f6103 HIP: Cleanup hipification header (#15285) uvos 2025-08-14 16:23:56 +02:00
  • b204a5a234 gpt-oss: implement harmony parsing (#15181) Aldehir Rojas 2025-08-14 09:23:11 -05:00
  • 646944cfa8 docker : Enable GGML_CPU_ALL_VARIANTS for ARM (#15267) Christian Kastner 2025-08-14 16:22:58 +02:00
  • 1a01899b61 readme : update hot topics (#15315) Georgi Gerganov 2025-08-14 17:16:03 +03:00
  • 863d341eeb vulkan: perf_logger improvements (#15246) Jeff Bolz 2025-08-14 08:38:10 -05:00
  • 220860aa0c graph : use F32 accumulators for gpt-oss gg/graph-prec Georgi Gerganov 2025-08-14 16:08:31 +03:00
  • d32e03f449 server : add SWA checkpoints (#15293) Georgi Gerganov 2025-08-14 14:59:50 +03:00
  • 3973163bff sync : ggml Georgi Gerganov 2025-08-14 14:19:23 +03:00