Commit Graph

  • 6118c043b1 ci : bump ty to 0.0.33 (#22535) Sigbjørn Skjæret 2026-04-30 15:15:54 +02:00
  • 5f0ab726f7 vendor : update cpp-httplib to 0.43.2 (#22548) b8987 Adrien Gallouët 2026-04-30 15:04:39 +02:00
  • e82aaf2587 CUDA: fix tile FA kernel on Pascal (#22541) b8986 Johannes Gäßler 2026-04-30 13:04:50 +02:00
  • cb8a3a93ec Merge branch 'master' into pr/18039 Georgi Gerganov 2026-04-30 10:08:10 +03:00
  • 6eddb1c6e3 pi : add rule to use gh CLI for GitHub resources gg/pi-gh-tool-rule Georgi Gerganov 2026-04-30 09:49:54 +03:00
  • c6dbd31146 docs : update speculative decoding parameters after refactor (#22397) Georgi Gerganov 2026-04-30 09:44:48 +03:00
  • 27aef3dd91 scripts : add wc2wt.sh - create worktree from current HEAD (#22513) Georgi Gerganov 2026-04-30 09:20:26 +03:00
  • 45155597aa add fast matmul iquants (#22504) b8984 Rithik Sharma 2026-04-29 22:58:32 -07:00
  • 80afa33aad spec : fix draft model checkpoints (#22521) b8983 Georgi Gerganov 2026-04-30 08:32:18 +03:00
  • b42c7fa5b8 spec : fix vocab compat checks in spec example (#22426) b8982 Peter Sideris 2026-04-30 08:18:25 +03:00
  • d77599234e common : do not pass prompt tokens to reasoning budget sampler (#22488) b8981 Aldehir Rojas 2026-04-29 14:10:58 -05:00
  • 41a63be28e hexagon: make vmem and buffer-size configurable (#22487) b8980 Max Krasnyansky 2026-04-29 11:51:21 -07:00
  • 098705a29e CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478) b8979 Anav Prasad 2026-04-29 11:39:56 -07:00
  • 683c5acb90 spec : disacard last drafted token with low prob (#22506) b8978 Georgi Gerganov 2026-04-29 17:00:00 +03:00
  • b1d5f5b449 sync : ggml b8977 Georgi Gerganov 2026-04-29 16:43:08 +03:00
  • 4b221b7f1e ggml : bump version to 0.10.1 (ggml/1469) Georgi Gerganov 2026-04-29 16:41:45 +03:00
  • c6a04cb5c3 ggml-metal: fix 2D async copy to use row-by-row transfers gg/metal-implement-async-2d Georgi Gerganov 2026-04-29 14:57:48 +03:00
  • f9e19a1f6e pi: add rule to not force push branches unless asked Georgi Gerganov 2026-04-29 14:37:13 +03:00
  • c3a54d6253 ggml-metal: implement async 2D tensor copy functions Georgi Gerganov 2026-04-29 14:22:06 +03:00
  • 59237bfbbc webui: fix slow mic stop and WAV encode (#22480) Pascal 2026-04-29 12:58:35 +02:00
  • 1cbc846eba ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (#22293) b8974 shalinib-ibm 2026-04-29 16:02:40 +05:30
  • 3142f1dbb9 ggml-cuda: refactor fusion code (#22468) b8973 Aman Gupta 2026-04-29 16:19:33 +08:00
  • b5c4227dc6 ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317) b8972 qiurui144 2026-04-29 15:59:21 +08:00
  • d6a5094004 ggml-webgpu: Fix bug in FlashAttention support check (#22492) b8971 Reese Levine 2026-04-29 00:59:00 -07:00
  • 7b95ea5d11 common: Intentionally leak logger instance to fix hanging on Windows (#22273) b8970 Masato Nakasaka 2026-04-29 16:58:43 +09:00
  • bdc9c743a5 ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (#21916) b8969 hrushitfujitsu 2026-04-29 13:27:37 +05:30
  • 739393beeb TP: fix delayed AllReduce + zero-sized slices (#22489) b8968 Johannes Gäßler 2026-04-29 08:55:07 +02:00
  • fc2b0053ff ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196) b8967 Michael Wand 2026-04-28 15:47:42 -07:00
  • 7b8443ac78 ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (… (#22286) b8966 lnigam 2026-04-29 01:07:35 +05:30
  • 5d56effdee convert : add support for Nemotron Nano 3 Omni (#22481) Daniel Bevenius 2026-04-28 19:17:57 +02:00
  • 52e5f0a5c1 common : re-arm reasoning budget after DONE on new <think> (#22323) b8964 Jillis ter Hove 2026-04-28 19:15:36 +02:00
  • f9f33654a6 vulkan: Coalesce Q4_K/Q5_K scale loads (#21751) b8963 Matt Corallo 2026-04-28 15:31:04 +00:00
  • 98bb57916a ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing logic (#22456) b8962 Reese Levine 2026-04-28 07:27:17 -07:00
  • f42e29fdf1 webui: Server tools (#21237) Aleksander Grygier 2026-04-28 14:35:49 +03:00
  • 19821178be vulkan: add barrier after writetimestamp (#21865) b8960 Jeff Bolz 2026-04-28 12:28:12 +02:00
  • 698d19b93c ggml: improve SPIR-V headers detection with __has_include (#21918) Emil Askerov 2026-04-28 13:19:06 +03:00
  • 50494a2800 ggml : skip already registered backends and devices (#22296) b8958 Adrien Gallouët 2026-04-28 09:02:32 +02:00
  • d530d6e7a2 ggml : revert to -lm linking instead of find_library (#22355) b8957 Adrien Gallouët 2026-04-28 08:56:02 +02:00
  • c3e08f4700 CANN: add new ops, optimize existing ops (#21204) b8956 hipudding 2026-04-28 14:27:22 +08:00
  • 14e733e36f spec : refactor params (#22397) b8955 Georgi Gerganov 2026-04-28 09:07:33 +03:00
  • 516e8d7a8a server: use pos_next instead of n_tokens for m-rope (#22439) b8954 Aman Gupta 2026-04-28 13:41:00 +08:00
  • 434b2a1ff6 ggml-webgpu: add Q1_0 support (#22374) b8953 Rithik Sharma 2026-04-27 15:50:59 -07:00
  • 983ca8992e server: (router) Forward form-data to model server (Fixes #22044) (#22118) b8952 tha80 2026-04-27 23:55:00 +02:00
  • 665abc6097 add fast mat-vec kernels for i-quants (#22344) b8951 Rithik Sharma 2026-04-27 08:25:45 -07:00
  • 4414c04b9a Additional test for common/gemma4 : handle parsing edge cases (#22420) b8950 Igor Rudenko 2026-04-27 17:36:59 +03:00
  • ceaf47c4b1 fix: rpc-server cache may not work in Windows environments (#22394) b8949 unraido 2026-04-27 23:25:09 +09:00
  • 42401c72b8 Fix type casting for unaccounted memory calculation (#22424) b8948 rankaiyx 2026-04-27 20:31:13 +08:00
  • e940b3d468 download : prefer q8_0 when q4_k not available (#22428) b8947 Georgi Gerganov 2026-04-27 15:30:29 +03:00
  • fd6f79c7a4 download : prefer q8_0 when q4_k not available gg/download-prefer-q8 Georgi Gerganov 2026-04-27 12:08:25 +03:00
  • 0f1bb602dd model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (#22421) b8946 ynankani 2026-04-27 07:58:48 +00:00
  • d13540becd convert : remove input_scale for dequantized fp8 modelopt (#22356) Sigbjørn Skjæret 2026-04-27 08:45:01 +02:00
  • f84270ea10 ggml : use 64 bytes aligned tile buffers (#21058) b8944 Adrien Gallouët 2026-04-27 08:30:55 +02:00
  • 5594d13224 common: fix missing exports in llama-common (#22340) b8943 Max Krasnyansky 2026-04-26 22:06:39 -07:00
  • f535774325 pr2wt : symlink .pi (#22386) Georgi Gerganov 2026-04-26 19:49:26 +03:00
  • 06a811d085 add performance-portable tuning for register-tile and subgroup matmul (#22241) b8941 Rithik Sharma 2026-04-26 09:26:28 -07:00
  • 78433f606f Fix recurrent state serialization for partial reads and writes (#22362) b8940 Gaurav Garg 2026-04-26 17:04:40 +05:30
  • 7ec36aa861 Github: set meta backend code owner (#22388) Johannes Gäßler 2026-04-26 13:34:13 +02:00
  • b1a5bd4e0c CUDA: better coalesce data-access for contiguous concat (#22330) Oliver Simons 2026-04-26 09:21:45 +02:00
  • cb9fc575e4 common : use pimpl in debug.h to reduce header dependencies pr/22340-gg Georgi Gerganov 2026-04-26 09:45:41 +03:00
  • 68adf99ff7 cont : cleanup Georgi Gerganov 2026-04-26 09:39:29 +03:00
  • 0c6ee1cade ggml-cpu : re-enable fast gelu_quick_f16 (#22339) b8937 Sigbjørn Skjæret 2026-04-26 08:28:14 +02:00
  • 2dd84169d1 ggml-cpu: optimize avx2 q6_k (#22345) b8936 Eve 2026-04-26 06:27:50 +00:00
  • f454bd7eb8 opencl: add iq4_nl support (#22272) b8935 lhez 2026-04-25 21:21:58 -07:00
  • b760272f1a hexagon: guard HMX clock request for v75+ platforms (#22377) b8934 Trivikram Reddy 2026-04-25 19:58:26 -05:00
  • 38d762d8fc common: refactor common/debug to move abort_on_nan into base_callback_data Max Krasnyansky 2026-04-25 16:48:16 -07:00
  • dcad77cc3b chat: fix handling of space in reasoning markers (#22353) b8933 Piotr Wilkin (ilintar) 2026-04-25 21:24:13 +02:00
  • 98dc1418ea spec : fix vocab compat checks (#22358) Georgi Gerganov 2026-04-25 20:11:35 +03:00
  • 9725a313be CUDA: reduce MMQ stream-k overhead (#22298) b8931 Johannes Gäßler 2026-04-25 14:15:03 +02:00
  • d1649047a3 metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962) Developer-Ecosystem-Engineering 2026-04-25 05:14:28 -07:00
  • 9d34231bb8 llama-quant : default ftype param Q5_1 --> Q8_0 (#20828) b8929 ddh0 2026-04-25 01:25:35 -05:00
  • 8ea8fee966 gitignore : add .pi + personal SYSTEM.md (#22316) Georgi Gerganov 2026-04-25 09:20:45 +03:00
  • eddd7a13a5 [SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291) b8927 Neo Zhang 2026-04-25 14:20:14 +08:00
  • dd2914dc81 ggml-webgpu: support for SSM_SCAN and disable set_rows error checking (#22327) b8926 Reese Levine 2026-04-24 23:18:15 -07:00
  • 0adede866d parser: fix structured output bug (#22302) b8925 Piotr Wilkin (ilintar) 2026-04-24 23:19:55 +02:00
  • 361fe72acb Hexagon: Bump HMX Frequency to Max Corner (#22334) b8924 Trivikram Reddy 2026-04-24 15:55:17 -05:00
  • a702f39597 CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303) Shreya Jain 2026-04-24 12:21:36 -07:00
  • 13d36cf891 ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix (#22199) b8922 Zheyuan Chen 2026-04-24 10:39:09 -07:00
  • f65bc34c68 hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306) Mengsheng Wu 2026-04-25 00:21:33 +08:00
  • 91b03e4c93 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-04-24 14:20:12 +03:00
  • 15fa3c493b metal : print GPU description (#22318) b8920 Georgi Gerganov 2026-04-24 13:56:03 +03:00
  • dc80c5252a common : fix jinja warnings with clang 21 (#22313) b8919 Adrien Gallouët 2026-04-24 12:36:02 +02:00
  • e583f3b4f5 ggml : minor coding style (#22308) b8918 Georgi Gerganov 2026-04-24 11:02:00 +03:00
  • 017f090442 jinja : remove unused header (#22310) b8917 Georgi Gerganov 2026-04-24 11:01:46 +03:00
  • ffdd983fb8 server : fix swa-full logic (#22288) b8916 Georgi Gerganov 2026-04-24 10:17:37 +03:00
  • 793d0a7931 server: rename debug tags to match --cache-idle-slots naming (#22292) Yes You Can Have Your Own 2026-04-24 09:28:44 +03:00
  • 8bc492ebb4 hexagon: add SOLVE_TRI op (#21974) b8914 Mengsheng Wu 2026-04-24 09:39:13 +08:00
  • e5f070a1dc fix(shader): handle the buffer aliasing for rms fuse (#22266) b8913 Chen Yuan 2026-04-23 19:32:59 -04:00
  • fa0b8a70a8 cli: Remove redundant local sampling variables (#20429) (#22264) b8912 Ethan Turner 2026-04-23 15:53:23 -07:00
  • 5d2b52d80d hexagon: add support for basic and extended Op profiling (#22269) b8911 Max Krasnyansky 2026-04-23 14:17:21 -07:00
  • 187a456370 Enable testing on Snapdragon devices (#21051) Shreya Jain 2026-04-23 13:08:10 -07:00
  • 185cbff6f1 server : convert_anthropic_to_oai: also copy chat_template_kwargs (#22154) b8909 srkizer 2026-04-24 03:32:46 +09:00
  • c78fb909b2 server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) (#22267) b8908 Song Li 2026-04-23 12:39:07 -04:00
  • 12568ca8c8 vendor : update LibreSSL to 4.3.1 (#22285) b8907 Adrien Gallouët 2026-04-23 17:45:56 +02:00
  • c807c6e3b0 server: (anthropic API) fix prefix caching (#21793) b8906 kvc0 2026-04-23 08:45:02 -07:00
  • 0949beb5a3 fix build number for sycl release (#22283) b8905 Sigbjørn Skjæret 2026-04-23 15:38:58 +02:00
  • 9012c50fc8 model-conversion : fix mmproj output file name [no ci] (#22274) Daniel Bevenius 2026-04-23 15:07:38 +02:00
  • 0dd7f915fd cli : cleanup auto-completion code (#21745) Matthias Straka 2026-04-23 15:03:28 +02:00
  • 550d684bd1 server: Enable transcriptions API for LFM2-Audio (#22000) b8902 Tarek Dakhran 2026-04-23 10:47:26 +02:00
  • b9421898b6 add for Q4_0 opt_arc770_q4_0 arthw 2026-04-23 15:33:19 +08:00
  • 8635e221c8 metal : fix event synchronization (#22260) b8901 Georgi Gerganov 2026-04-23 08:22:49 +03:00