Commit Graph

  • 8d0e158076 state Ruben Ortlam 2026-04-09 11:51:39 +02:00
  • aade0f81dd state Ruben Ortlam 2026-04-09 11:42:50 +02:00
  • 0ec191e1d7 vocab: add gemma4 tokenizer tests, fix edge case (#21534) b8730 Piotr Wilkin (ilintar) 2026-04-09 11:41:14 +02:00
  • 243532e556 jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623) b8729 Kwa Jie Hao 2026-04-09 17:28:33 +08:00
  • 700270239d state Ruben Ortlam 2026-04-09 11:24:21 +02:00
  • ddaafa3dc1 state Ruben Ortlam 2026-04-09 11:11:17 +02:00
  • e5e0be0add state Ruben Ortlam 2026-04-09 11:00:36 +02:00
  • 5e9c635463 metal : add missing mm-id specializations for q1_0 (#21662) b8728 Georgi Gerganov 2026-04-09 10:54:00 +03:00
  • 9949ad08f6 fix: Model Selector choice sync (#21628) Aleksander Grygier 2026-04-09 09:46:27 +02:00
  • 3ee9da0e4f server : fix grammar commandline args (#21543) b8726 AUTOMATIC1111 2026-04-09 10:16:54 +03:00
  • 75511a8d7e webui: Add option to pre-encode conversation for faster next turns (#21034) Aleksander Grygier 2026-04-09 09:10:18 +02:00
  • b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654) b8724 Akarshan Biswas 2026-04-09 12:06:48 +05:30
  • 8a65a7a8ee ci: drop v5 all: composition from labeler.yml (#21627) Marxist-Leninist 2026-04-09 07:20:19 +01:00
  • 3c4eae7dc9 state Ruben Ortlam 2026-04-09 07:50:05 +02:00
  • 7e2799c8c9 state Ruben Ortlam 2026-04-09 07:40:02 +02:00
  • 8a132faaa0 vulkan: unify type macros to use Vx instead of _VECx (#21605) b8722 Ruben Ortlam 2026-04-09 07:31:51 +02:00
  • 4293919068 common : skip non-primary GGUF split files when selecting model (#21633) b8721 Adrien Gallouët 2026-04-09 07:28:06 +02:00
  • cd0722594a state Ruben Ortlam 2026-04-09 07:25:33 +02:00
  • d12cc3d1ca CUDA: also store node->src->data ptrs for equality check (#21635) b8720 Aman Gupta 2026-04-09 01:01:56 +08:00
  • 2dcb7f74ed fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592) b8719 RealOrko 2026-04-08 16:40:15 +01:00
  • 660600081f server: respect the ignore eos flag (#21203) b8718 Yuri Khrustalev 2026-04-08 11:12:15 -04:00
  • d9a12c82f0 vocab : remove </s> eog token if gemma4 (#21492) b8717 Aldehir Rojas 2026-04-08 09:53:06 -05:00
  • 4a05e0c566 webui : send both backend_sampling == false/true (#18781) Georgi Gerganov 2026-04-08 17:35:52 +03:00
  • e9fd96283d Propose fix a couple of typos (#21581) b8715 John Eismeier 2026-04-08 10:29:03 -04:00
  • 3ba12fed0a kv-cache : extend cache quantization checks (#21586) b8714 Erik Scholz 2026-04-08 15:08:57 +02:00
  • 5473949070 webgpu : Query for adapter support when registering WebGPU backend (#21579) b8713 Reese Levine 2026-04-08 06:08:29 -07:00
  • dcdcbad42a metal: Q1_0 backend (#21528) b8712 Pasha Khosravi 2026-04-08 06:07:47 -07:00
  • 5764d7c6a6 gemma : perform per-layer projections in the first layer (#21612) b8711 Georgi Gerganov 2026-04-08 16:06:30 +03:00
  • 87f4744a80 examples : disable cb_eval callback for --save-logits (#21553) b8710 Daniel Bevenius 2026-04-08 14:10:33 +02:00
  • 85d482e6b6 parser: fix MiniMax handling (#21573) b8709 Piotr Wilkin (ilintar) 2026-04-08 12:47:25 +02:00
  • ae65fbdf33 tests : remove obsolete .mjs script (#21615) b8708 Georgi Gerganov 2026-04-08 13:20:46 +03:00
  • 3bd9aa1f92 chore: Update labeler to have separate labels for server/webui and server changes (#21567) Aleksander Grygier 2026-04-08 10:35:31 +02:00
  • ece522f98c chore: Remove legacy files (#21606) Aleksander Grygier 2026-04-08 09:55:08 +02:00
  • 09343c0198 model : support step3-vl-10b (#21287) b8705 forforever73 2026-04-08 15:51:31 +08:00
  • 97508acb17 webui: fix syntax highlighting lost after streaming for non-common languages (#21206) Hamish M. Blair 2026-04-07 23:58:08 -07:00
  • 5c4aae66e1 devops: kleidiai: provide KleidiAI-Enabled ARM Release Artifact (#21259) b8703 Martin Klacer 2026-04-08 06:06:12 +01:00
  • c5ce4bc227 CUDA: make cuda graphs props check faster (#21472) b8702 Aman Gupta 2026-04-08 09:05:51 +08:00
  • 66c4f9ded0 ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168) b8701 iacopPBK 2026-04-07 21:47:42 +02:00
  • 93bdc61563 gguf-py : fix missing comma after bad merge in tensor-mapping (#21558) Daniel Bevenius 2026-04-07 21:24:25 +02:00
  • 4eb19514dd kv-cache : support attention rotation for heterogeneous iSWA (#21513) b8699 Georgi Gerganov 2026-04-07 20:31:28 +03:00
  • 957d717ce5 ggml-webgpu: parameterize submission size and add iOS specific limits (#21533) b8698 Reese Levine 2026-04-07 10:30:01 -07:00
  • de1aa6fa73 CUDA: check for buffer overlap before fusing (#21566) b8697 Aman Gupta 2026-04-08 00:57:04 +08:00
  • 69c28f1547 llama-server: fix model params not propagated (#21509) b8696 Aaron Teo 2026-04-07 21:39:41 +08:00
  • 0d049d6a92 unicode : add custom Qwen2 regex handler to fix segfault on long input (#21257) Son H. Nguyen 2026-04-07 22:13:38 +09:00
  • a8ec0df461 llama: remove per-arch tensor name lists (#21531) b8694 Johannes Gäßler 2026-04-07 15:02:03 +02:00
  • e8f5082697 server : fix restore for checkpoints with pos_min == 0 (#21510) b8693 Georgi Gerganov 2026-04-07 15:29:17 +03:00
  • 22fc79134e ggml : deprecate GGML_OP_ADD1 (#21363) b8692 Georgi Gerganov 2026-04-07 15:28:27 +03:00
  • 2a619f6fbc ggml: Vulkan build, Linux -- output error string for errno on fork failure (#20868) (#20904) b8691 Tom Overlund 2026-04-07 07:54:55 -04:00
  • edd4d9bca5 vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029) b8690 mkoker 2026-04-07 07:41:29 -04:00
  • 482192f12d webui : store reasoning_content so it is sent back in subsequent requests (#21249) Aldehir Rojas 2026-04-07 06:32:44 -05:00
  • 71a81f6fcc ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) (#21519) b8688 Antoine Viallon 2026-04-07 12:18:55 +02:00
  • ecce0087da fix: Detect streaming state in reasoning content blocks (#21549) Aleksander Grygier 2026-04-07 12:04:41 +02:00
  • d1f82e382d Fix rtl text rendering (#21382) Kabir08 2026-04-07 15:07:20 +05:30
  • 0988accf82 [SYCL] Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (#21527) b8685 PMZFX 2026-04-07 04:12:49 -04:00
  • 0033f53a07 docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu) (#21518) b8684 Dmytro Romanov 2026-04-07 06:37:26 +02:00
  • d0a6dfeb28 ggml-webgpu: Add the support of MUL_MAT_ID (#21147) b8683 Masashi Yoshimura 2026-04-07 05:08:46 +09:00
  • 2e1f0a889e ggml: add Q1_0 1-bit quantization support (CPU) (#21273) b8682 Pasha Khosravi 2026-04-06 11:55:21 -07:00
  • 506200cf8b cli: fix stripping of \n in multiline input (#21485) b8681 Bipin Yadav 2026-04-07 00:24:06 +05:30
  • 15f786e658 [CUDA ] Write an optimized flash_attn_stream_k_fixup kernel (#21159) b8680 Gaurav Garg 2026-04-07 00:04:29 +05:30
  • 94ca829b60 llama-bench: add -fitc and -fitt to arguments (#21304) b8679 Aman Gupta 2026-04-06 22:26:02 +08:00
  • 4aa962e2b0 vocab : add byte token handling to BPE detokenizer for Gemma4 (#21488) b8678 Aldehir Rojas 2026-04-06 09:08:37 -05:00
  • 941146b3f1 convert : fix block_ff_dim retrieval for lfm2 (#21508) Sigbjørn Skjæret 2026-04-06 14:05:18 +02:00
  • 482d862bcb server : handle unsuccessful sink.write in chunked stream provider (#21478) b8676 lainon1 2026-04-06 13:03:02 +01:00
  • 3979f2bb08 docs: add hunyuan-ocr gguf, also add test [no ci] (#21490) Xuan-Son Nguyen 2026-04-06 14:02:37 +02:00
  • 400ac8e194 convert : set "add bos" == True for Gemma 4 (#21500) Georgi Gerganov 2026-04-06 13:52:07 +03:00
  • f51fd36d79 sycl : handle other FA case (#21377) Neo Zhang 2026-04-06 18:28:00 +08:00
  • a30369d515 cpu: fix ARM NEON nvfp4 vec dot 0cc4m/cpu-arm-nvfp4-fix Ruben Ortlam 2026-04-06 10:26:50 +02:00
  • 25eec6f327 hexagon: slight optimization for argosrt output init (#21463) b8672 Yarden Tal 2026-04-06 04:30:25 +03:00
  • 58190cc84d llama : correct platform-independent loading of BOOL metadata (#21428) b8671 anchortense 2026-04-06 09:40:38 +10:00
  • af76639f72 model : add HunyuanOCR support (#21395) b8670 Richard Davison 2026-04-05 23:32:14 +02:00
  • 761797ffdf ci : use default RISE RISC-V Runners (#21263) Ludovic Henry 2026-04-05 20:29:48 +02:00
  • 5d3a4a7da5 server : fix logging of build + system info (#21460) b8668 ddh0 2026-04-05 09:14:02 -05:00
  • c08d28d088 ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438) b8667 M1DNYT3 2026-04-05 04:04:00 +03:00
  • 661e9acb36 ci: fix vulkan workflow referencing non-existent action (#21442) Nicholas Sparks 2026-04-04 20:59:51 -04:00
  • b8635075ff common : add gemma 4 specialized parser (#21418) b8665 Aldehir Rojas 2026-04-04 13:39:00 -05:00
  • 9c699074c9 server: Fix undefined timing measurement errors in server context (#21201) b8664 Dan Hoffman 2026-04-04 07:11:19 -07:00
  • d01f6274c0 common : respect specified tag, only fallback when tag is empty (#21413) b8663 Adrien Gallouët 2026-04-04 15:08:03 +02:00
  • 650bf14eb9 llama-model: read final_logit_softcapping for Gemma 4 (#21390) b8662 SamareshSingh 2026-04-04 06:05:10 -05:00
  • b7ad48ebda llama: add custom newline split for Gemma 4 (#21406) b8661 Aman Gupta 2026-04-04 15:06:34 +08:00
  • d006858316 ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278) b8660 Reese Levine 2026-04-03 11:40:14 -07:00
  • e439700992 ci: Add Windows Vulkan backend testing on Intel (#21292) Masato Nakasaka 2026-04-04 02:16:44 +09:00
  • 50e0ad08fb server: save and clear idle slots on new task (--clear-idle) (#20993) b8658 Yes You Can Have Your Own 2026-04-03 20:02:27 +03:00
  • f1f793ad06 common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) b8657 Piotr Wilkin (ilintar) 2026-04-03 17:51:52 +02:00
  • af5c13841f common : fix tool call type detection for nullable and enum schemas (#21327) b8656 Samanvya Tripathi 2026-04-03 11:51:23 -04:00
  • 277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920) M1DNYT3 2026-04-03 16:06:45 +03:00
  • 384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331) jeromew 2026-04-03 15:05:14 +02:00
  • 1f34806c44 jinja: coerce input for string-specific filters (#21370) b8653 Sigbjørn Skjæret 2026-04-03 15:03:33 +02:00
  • 887535c33f ci: add more binary checks (#21349) Aaron Teo 2026-04-03 20:50:00 +08:00
  • d3416a4aa9 fix: remove stale assert (#21369) b8651 Piotr Wilkin (ilintar) 2026-04-03 13:40:41 +02:00
  • 43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337) uvos 2026-04-03 11:38:22 +02:00
  • f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355) Tillerino 2026-04-03 11:21:07 +02:00
  • f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315) b8648 Vishal Singh 2026-04-03 14:49:08 +05:30
  • b069b10ab4 vocab: fix Gemma4 tokenizer (#21343) Piotr Wilkin (ilintar) 2026-04-03 10:33:03 +02:00
  • 0c58ba3365 rpc : reuse compute graph buffers (#21299) b8646 Radoslav Gerganov 2026-04-03 10:28:09 +03:00
  • 57ace0d612 chat : avoid including json in chat.h (#21306) b8645 Georgi Gerganov 2026-04-03 09:07:59 +03:00
  • 39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332) b8644 Georgi Gerganov 2026-04-03 09:07:01 +03:00
  • f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345) b8643 Vishal Singh 2026-04-03 08:05:15 +05:30
  • 7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066) b8642 Slobodan Josic 2026-04-03 00:59:20 +02:00
  • 5208e2d5ba fix: gemma 4 template (#21326) b8641 Piotr Wilkin (ilintar) 2026-04-02 23:31:02 +02:00
  • 7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112) b8640 Bartowski 2026-04-02 16:53:58 -04:00