Commit Graph

  • 08c8012bde cont : sync main and drft contexts Georgi Gerganov 2026-05-07 18:47:34 +03:00
  • de35b1255c server, spec : transition to unified spec context Georgi Gerganov 2026-05-07 17:57:59 +03:00
  • 1afee5b262 server : improve ctx names Georgi Gerganov 2026-05-07 13:07:44 +03:00
  • 11fd5e7272 server : draft prompt cache and checkpoints Georgi Gerganov 2026-05-07 12:47:56 +03:00
  • c97dc3605e server : sketch the ctx_dft decode loop Georgi Gerganov 2026-05-07 10:50:42 +03:00
  • 8a50f6f0b9 cont : dedup ctx_seq_rm_type Georgi Gerganov 2026-05-07 10:22:20 +03:00
  • 77269ad8a7 cont : pass seq_id Georgi Gerganov 2026-05-07 10:14:18 +03:00
  • 4550f0f08b spec : update common_speculative_init() Georgi Gerganov 2026-05-07 10:06:32 +03:00
  • befc7ef635 spec : drop support for incompatible vocabs Georgi Gerganov 2026-05-07 09:54:09 +03:00
  • 2c9a40849f spec : refactor Georgi Gerganov 2026-05-07 08:39:46 +03:00
  • e43431b381 llama : fix device state save/load (#22805) b9064 Georgi Gerganov 2026-05-07 21:43:40 +03:00
  • ceb7e14b96 opencl: add opfilter regex for debugging (#22782) b9063 shaofeiqi 2026-05-07 11:00:20 -07:00
  • 093be624cc common/chat : preserve media markers for typed-content templates (#22634) b9062 Aldehir Rojas 2026-05-07 12:50:56 -05:00
  • deab41ec68 tests: add long-sequence cases and fix inputs for gated_delta_net (#22794) b9061 HaoJun ZHANG 2026-05-08 00:23:36 +08:00
  • ad09224658 sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET (#22149) b9060 Intel AI Get-to Market Customer Success and Solutions 2026-05-07 08:51:33 -07:00
  • b9afc19cb4 Write a readme on Multi-GPU usage in llama.cpp (#22729) Gaurav Garg 2026-05-07 21:18:40 +05:30
  • 803627f121 llama : remove unnecessary seq_id check during state restore (#22797) b9058 Georgi Gerganov 2026-05-07 16:37:26 +03:00
  • 68380ae11b ggml-cpu: Optimized risc-v cpu q1_0 dot b9057 pl752 2026-05-07 18:09:25 +05:00
  • cc97e45a14 mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT (#22770) b9056 Pascal 2026-05-07 14:01:01 +02:00
  • 8e52631d55 model: Add Mimo v2.5 model support (#22493) b9055 AesSedai 2026-05-07 04:21:58 -07:00
  • f4b5a2ee91 webui: fix ?model= URL param race in router mode (#22771) Pascal 2026-05-07 13:09:32 +02:00
  • 97f06e9eed codeowners : add ZenDNN backend codeowner (#22772) Vishal Singh 2026-05-07 12:16:51 +05:30
  • e358d75adb webui: fix flicker issue on dismiss animation on overlay primitives (#22773) viggy 2026-05-06 23:11:31 -07:00
  • cfff1fc300 sycl : fix test script (#22737) Shane Tran Whitmire 2026-05-07 00:25:57 -05:00
  • 3980e04d5a llama : add missing call to ggml_backend_load_all() (#22752) b9050 Adrien Gallouët 2026-05-07 07:24:47 +02:00
  • 2496f9c149 mtmd : support MiniCPM-V 4.6 (#22529) b9049 tc-mb 2026-05-07 03:54:09 +08:00
  • 5207d120ea model : don't crash on unsupported architecture (#22742) b9048 Gilad S. 2026-05-06 11:51:21 -05:00
  • a0101225bc common: do not fit to unknown device memory (#22614) b9047 fl0rianr 2026-05-06 17:03:45 +02:00
  • a290ce6266 gguf-py : bump version to 0.19.0 (#22664) gguf-v0.19.0 Georgi Gerganov 2026-05-06 15:46:14 +03:00
  • a00e47e422 mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101) b9045 Yakine Tahtah 2026-05-06 14:40:59 +02:00
  • 750141969c feat: migrate to PEP 621 and add uv support (#21907) David Huggins-Daines 2026-05-06 08:04:10 -04:00
  • a736e6c0ac convert : ignore non-language tensors for Gemma4Model (#22753) Daniel Bevenius 2026-05-06 13:50:44 +02:00
  • e3e3f8e46a webui: Remove Google Favicons & Improve MCP Information logic & UI (#22719) Aleksander Grygier 2026-05-06 11:12:27 +02:00
  • f08f20a0e3 ggml-cpu: fuse RMS_NORM + MUL on CPU backend (#22423) b9041 zzzzwc 2026-05-06 15:41:14 +08:00
  • 07eaf919ed add tabindex and aria-hidden (#22699) viggy 2026-05-06 00:21:58 -07:00
  • 74d6248f71 convert : add filter_tensors method to pre-filter tensors (#22597) Sigbjørn Skjæret 2026-05-06 08:06:05 +02:00
  • 2ca1161bd7 ggml : use CL_DEVICE_GLOBAL_MEM_SIZE as memory estimate for OpenCL --fit (#22688) b9038 fl0rianr 2026-05-06 07:12:48 +02:00
  • 0445829c1d llama : enable layer input extraction gg/llama-extract-embeddings Georgi Gerganov 2026-05-05 20:50:20 +03:00
  • bbeb89d76c Hexagon: Process M-tail rows on HMX instead of HVX (#22724) b9037 Trivikram Reddy 2026-05-05 11:43:03 -05:00
  • ff806a110d opencl: refactor Adreno q4_0 (#22335) lhez 2026-05-05 09:38:57 -07:00
  • d5003b6e4d rpc : use graph uid instead of graph cache (#22701) Radoslav Gerganov 2026-05-05 13:47:13 +03:00
  • 2635ac76e8 common : fix missing-noreturn warnings when compiling with clang 21 (#22702) Adrien Gallouët 2026-05-05 12:16:25 +02:00
  • 70a8309114 sync : ggml b9033 Georgi Gerganov 2026-05-05 13:15:19 +03:00
  • c91faf997f ggml : bump version to 0.11.0 (ggml/1478) Georgi Gerganov 2026-05-05 13:14:32 +03:00
  • bf76ac77be common : only load backends when required (#22290) b9031 Adrien Gallouët 2026-05-05 09:23:50 +02:00
  • a09a00e502 vendor : update cpp-httplib to 0.43.3 (#22686) b9030 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-05 04:04:57 -03:00
  • f84632951a wip pr/18039-gg Georgi Gerganov 2026-04-25 18:27:15 +03:00
  • 4567954ab0 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-05-05 09:28:39 +03:00
  • 2bacb1eb77 server : validate --tools CLI argument against known tool names (#22538) b9029 Georgi Gerganov 2026-05-05 06:35:27 +03:00
  • d6e7b033a4 llama : add option to save memory in device buffers (#22679) b9028 Georgi Gerganov 2026-05-05 06:35:07 +03:00
  • fa595462ca graph : handle non-contiguous Q/K/V in mul_mat_aux (#22630) Sigbjørn Skjæret 2026-05-05 05:34:44 +02:00
  • a817a22bc6 ggml : implement fast walsh-hadamard transform for kv rotation (#21352) (#22631) b9026 Ismail 2026-05-05 04:05:05 +02:00
  • eff06702b2 kleidiai : update to v1.24.0 and use release archive (#22549) b9025 Charles Xu 2026-05-04 21:13:31 +02:00
  • 069be0ae22 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-05-04 21:42:27 +03:00
  • e77056f9b2 CUDA: use fastdiv for batch index split in get_rows (#22650) leonardHONG 2026-05-04 22:24:05 +08:00
  • 935a340292 server: implement /models?reload=1 (#21848) b9023 Xuan-Son Nguyen 2026-05-04 16:23:26 +02:00
  • d8794eecd5 examples: refactor diffusion generation (#22590) b9022 Shakhnazar Sailaukan 2026-05-04 16:19:30 +04:00
  • 36a694c965 webui : fix circular dependency between chat.service.ts and models.svelte.ts (#22625) JusteLeo 2026-05-04 13:38:10 +02:00
  • a4701c98f7 common/autoparser: fixes for newline handling / forced tool calls (#22654) b9020 Piotr Wilkin (ilintar) 2026-05-04 13:18:11 +02:00
  • 994118a183 model: move load_hparams and load_tensors to per-model definition (#22004) b9019 Xuan-Son Nguyen 2026-05-04 12:36:59 +02:00
  • c84e6d6db5 server: Add a simple get_datetime server tool (#22649) b9018 Evan Huus 2026-05-04 06:19:41 -04:00
  • 82af405161 arg : silence warnings about removed params gg/arg-silence-warnings Georgi Gerganov 2026-05-04 10:07:57 +03:00
  • fa8feaed34 webui: restore missing settings (#22666) Nick Towle 2026-05-04 00:04:07 -07:00
  • 846262d787 docs : update speculative decoding parameters after refactor (#22397) (#22539) b9016 Georgi Gerganov 2026-05-04 08:52:07 +03:00
  • 6dcd824fce vulkan: delete dead GGML_VK_MAX_NODES def (#22621) b9015 Atomic-Germ 2026-05-03 22:49:29 -07:00
  • d4b0c22f9e ggml-webgpu: add layer norm ops (#22406) b9014 Chen Yuan 2026-05-03 23:52:53 -04:00
  • e48034dfc9 common : determine generation prompt using longest common prefix (#22657) Aldehir Rojas 2026-05-03 17:18:23 -05:00
  • 048a490f76 convert : Mistral format yarn apply_scale support (#22612) b9012 Julien Denize 2026-05-03 21:51:21 +02:00
  • db44417b02 convert : apply Q/K RoPE permutation in NVFP4 repack path (#22611) JM Robles 2026-05-03 17:22:00 +02:00
  • d05fe1d7da fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus entirely) (#22533) b9010 lucy 2026-05-02 16:19:25 -04:00
  • 459b02f6c0 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-05-02 18:08:25 +03:00
  • 0754b7b6fe server : avoid checkpoint data host copies (#22558) b9009 Georgi Gerganov 2026-05-02 18:03:25 +03:00
  • 09294365a9 ggml-virtgpu: fix circular dependency in headers (#22557) b9008 JusteLeo 2026-05-02 15:28:50 +02:00
  • 63d93d1733 convert : disable uint types (#18908) Csaba Kecskemeti 2026-05-01 23:05:59 -07:00
  • c5a3bc39b1 opencl: Adreno optimization for MoE - MxFP4 (#22301) b9006 Shawn Gu 2026-05-01 23:02:24 -07:00
  • 9dbb372610 Github: update issue templates (#22594) Johannes Gäßler 2026-05-02 07:56:13 +02:00
  • 228e836344 sync : ggml b9004 Georgi Gerganov 2026-05-02 08:46:31 +03:00
  • ed23489f42 ggml : bump version to 0.10.2 (ggml/1474) Georgi Gerganov 2026-05-02 08:45:46 +03:00
  • 81eabb4781 sync : ggml sync-ggml-26-05-02 Georgi Gerganov 2026-05-02 08:46:31 +03:00
  • 7b38b8660f ggml : bump version to 0.10.2 (ggml/1474) Georgi Gerganov 2026-05-02 08:45:46 +03:00
  • 457e2288c9 sync : ggml b9002 Georgi Gerganov 2026-05-01 21:29:15 +03:00
  • e8ec7ab058 ggml : try fix win32 build (whisper/0) Georgi Gerganov 2026-05-01 18:53:30 +03:00
  • 1a03cf47f6 hexagon: hmx flash attention (#22347) b9000 Yiwei Shao 2026-05-01 20:29:13 -07:00
  • b97ebdc98f llama-quant : fix --tensor-type when default qtype is overriden (#22572) b8999 ddh0 2026-05-01 12:55:55 -05:00
  • 2098fd6169 hexagon: enable non-contiguous row tensor support for unary ops (#22574) b8998 Aparna M P 2026-05-01 22:39:23 +05:30
  • ab6120cde5 webui: Spring Cleaning Refactor v1 (#22505) Aleksander Grygier 2026-05-01 18:36:29 +02:00
  • c3c1505392 ggml-webgpu: Fix vectorized handling in mul-mat and mul-mat-id (#22578) b8996 Masashi Yoshimura 2026-05-01 23:55:01 +09:00
  • 05e141a6b3 vulkan: Support asymmetric FA in coopmat2 path (#21753) b8995 Jeff Bolz 2026-05-01 15:28:32 +02:00
  • aab68217b7 ggml-webgpu: add the upscale shader (#22419) b8994 Chen Yuan 2026-05-01 01:22:18 -04:00
  • a95a11e5b8 ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (#22464) Masashi Yoshimura 2026-05-01 06:19:10 +09:00
  • 5cbfb18075 Update llama-mmap to use ftello/fseeko (#22497) b8992 Reese Levine 2026-04-30 14:17:52 -07:00
  • beb42fffa4 common : check for null getpwuid in hf-cache (#22550) b8991 Adrien Gallouët 2026-04-30 21:32:41 +02:00
  • 9d5887035f testing gg/spec-ckpt-test Georgi Gerganov 2026-04-30 19:18:57 +03:00
  • a7c1110e87 server : avoid checkpoint data host copies Georgi Gerganov 2026-04-30 16:24:49 +03:00
  • 660b1b4bdc vulkan: add get/set tensor 2d functions (#22514) b8990 Ruben Ortlam 2026-04-30 17:37:13 +02:00
  • c20c44514a spec: fix argument typo (#22552) b8989 Ben Guidarelli 2026-04-30 10:32:32 -04:00
  • 6118c043b1 ci : bump ty to 0.0.33 (#22535) Sigbjørn Skjæret 2026-04-30 15:15:54 +02:00
  • 5f0ab726f7 vendor : update cpp-httplib to 0.43.2 (#22548) b8987 Adrien Gallouët 2026-04-30 15:04:39 +02:00
  • e82aaf2587 CUDA: fix tile FA kernel on Pascal (#22541) b8986 Johannes Gäßler 2026-04-30 13:04:50 +02:00
  • cb8a3a93ec Merge branch 'master' into pr/18039 Georgi Gerganov 2026-04-30 10:08:10 +03:00