Commit Graph

  • 37f230dd7c completion : session_tokens insert range in completion tool (no-op → correct) (#20917) b8551 mtmcp 2026-03-27 05:25:58 -03:00
  • a308e584ca completion : Fix segfault on model load failure (#21049) b8550 mtmcp 2026-03-27 05:01:13 -03:00
  • d0fa2c9fbb Send reasoning content back to the model across turns via the reasoning_content API field (#21036) Pascal 2026-03-27 08:17:35 +01:00
  • 9bcb4eff4d metal : Fix dimension constraint violation in matmul2d descriptor (#21048) b8548 ren 2026-03-27 00:05:21 -07:00
  • 6861f6509a CANN: update docker images to 8.5.0 and improve CANN.md (#20801) b8547 KokerZhou 2026-03-27 08:53:00 +08:00
  • 1743d98057 mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr (#21027) b8546 Saba Fallah 2026-03-27 00:07:55 +01:00
  • 7ca0c9cca7 hip: use fnuz fp8 for conversion on CDNA3 (#21040) b8545 uvos 2026-03-26 23:06:33 +01:00
  • 8c60b8a2be ci: pin external actions to exact commit SHA (#21033) Xuan-Son Nguyen 2026-03-26 20:44:00 +01:00
  • 287b5b1eab common : add getpwuid fallback for HF cache when HOME is not set (#21035) Adrien Gallouët 2026-03-26 20:34:23 +01:00
  • a73bbd5d92 mtmd: refactor image preprocessing (#21031) Xuan-Son Nguyen 2026-03-26 19:49:20 +01:00
  • e5aa067d68 llama : rotate activations for better quantization Georgi Gerganov 2026-03-26 18:38:55 +02:00
  • ded446b34c opencl: allow large buffer for adreno (#20997) lhez 2026-03-26 08:52:21 -07:00
  • f8d4abae86 convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505) Michael Wand 2026-03-26 08:52:06 -07:00
  • 3d5acab3e7 convert : add RuGPT3XL (RuGPT3XLForCausalLM) support (#21011) Pavel Zloi 2026-03-26 18:49:09 +03:00
  • 9900b29c3a common : filter out imatrix when finding models (#21023) Adrien Gallouët 2026-03-26 15:37:18 +01:00
  • dc8d14c582 fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CMake (#20888) ihb2032 2026-03-26 19:08:41 +08:00
  • 93dfbc1291 common : make LLAMA_CACHE the one cache for everything (#21009) Adrien Gallouët 2026-03-26 12:04:57 +01:00
  • 3cba8bba18 common : fix split model migration (#21019) Adrien Gallouët 2026-03-26 12:04:37 +01:00
  • 112c78159f ggml-cuda: Add NVFP4 dp4a kernel (#20644) Michael Wand 2026-03-26 01:54:03 -07:00
  • 0fac87b157 imatrix : fix crash when using --show-statistics with zero counts (#19532) b8533 SamareshSingh 2026-03-26 02:14:36 -05:00
  • 0a524f2404 CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D (#17094) b8532 Yihao Wang 2026-03-25 19:19:14 -07:00
  • c0159f9c1f common : do not delete old files from the old cache when updating (#21000) b8531 Adrien Gallouët 2026-03-25 22:28:04 +01:00
  • a970515bdb mtmd: Add DeepSeekOCR Support (#17400) b8530 Saba Fallah 2026-03-25 19:57:40 +01:00
  • 4cd732f445 better wording xsn/ai_policy_private_repo Xuan Son Nguyen 2026-03-25 19:46:17 +01:00
  • 056b50c319 common : fix verbosity setup (#20989) b8529 Adrien Gallouët 2026-03-25 19:41:01 +01:00
  • 9f9a0bde37 contrib: update AI policy to allow private repo Xuan Son Nguyen 2026-03-25 19:39:41 +01:00
  • f2c72b8f1f common : fix gguf selection in common_list_cached_models (#20996) b8528 Adrien Gallouët 2026-03-25 19:18:06 +01:00
  • ec54ac13a8 ci : fix parsing of vgpr counts in hip-quality-check (#20987) uvos 2026-03-25 19:00:37 +01:00
  • 80322ebdaf model: codefuse-ai/F2LLM-v2 support b8526 Saba Fallah 2026-03-25 18:33:42 +01:00
  • 44c51e526b model : allow causal_attn and pooling_type on all architectures (#20973) b8525 Dowon 2026-03-26 02:12:38 +09:00
  • 1922f87c2f snapdragon: add missing features to WoS scripts to achieve parity with ADB scripts (#20884) Aparna M P 2026-03-25 22:13:12 +05:30
  • 345de3cd87 Use docker in build-android.yml (#20928) Shreya Jain 2026-03-25 09:36:27 -07:00
  • 9c600bcd4b llama-bench: print -n-cpu-moe when offloaded layers > 1 (#20984) b8522 Aman Gupta 2026-03-25 21:17:27 +08:00
  • b2704f9028 ci: Allow ninja to be used during unit test (#20742) Masato Nakasaka 2026-03-25 06:00:49 -07:00
  • 3fab96cd04 ci : disable self-hosted mac jobs (#20985) Georgi Gerganov 2026-03-25 14:46:40 +02:00
  • 914eb5ff0c jinja: fix macro with kwargs (#20960) b8519 Xuan-Son Nguyen 2026-03-25 12:22:48 +01:00
  • 8fc17493c3 gguf-split : clarify operation of gguf-split (#19749) Francisco Herrera 2026-03-25 06:12:50 -05:00
  • 36dafba5c4 llama: fix llama-model-saver (#20503) b8517 Johannes Gäßler 2026-03-25 11:53:16 +01:00
  • 69e0ecef06 webui: Fix editing assistant message without branching (#20944) Aleksander Grygier 2026-03-25 11:47:33 +01:00
  • 062cca58fc Add SLEEPING status to the WebUI model selector (#20949) Pascal 2026-03-25 11:02:32 +01:00
  • 406f4e3f61 android : fix-pointer-dangling (#20974) b8514 yikechayedan 2026-03-25 17:51:26 +08:00
  • 53dc8b59bf sycl : fix wrong variable check by assert (#20903) b8513 Neo Zhang 2026-03-25 17:48:37 +08:00
  • 403c9c9cef ci : bump gguf publish python version (#20982) Sigbjørn Skjæret 2026-03-25 10:04:59 +01:00
  • 8fc85db9d2 ci : limit requirements versions (#20980) Sigbjørn Skjæret 2026-03-25 09:55:37 +01:00
  • 3a60d06ad9 convert : register Qwen3Model architecture (#20967) Dowon 2026-03-25 17:37:59 +09:00
  • abd86ef175 docs : Update OpenVINO backend docs (#20968) Ravi Panchumarthy 2026-03-25 01:33:51 -07:00
  • 07a6fd8775 kleidiai: removed cpu feature detection from CI run script pr/20394 Martin Klacer 2026-03-24 17:24:41 +00:00
  • 9f102a1407 models : move the token embedding norms to the first layer (#20943) b8508 Georgi Gerganov 2026-03-24 17:00:30 +02:00
  • 3fc6f1aed1 ggml-backend: re-enable graph reuse with pipeline parallelism (#20927) b8507 Aman Gupta 2026-03-24 20:47:00 +08:00
  • 29771a0a4c vendor : update cpp-httplib to 0.39.0 (#20933) b8506 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-24 09:33:33 -03:00
  • 42ebce3beb common : fix get_gguf_split_info (#20946) b8505 Adrien Gallouët 2026-03-24 13:33:14 +01:00
  • a94fdb090a WebUI: fix edit msg form textarea height (#20830) BlueMöhre 2026-03-24 13:17:45 +01:00
  • c9dc43333f readme : clarify MODEL_ENDPOINT usage (#20941) Adrien Gallouët 2026-03-24 10:35:07 +01:00
  • 2d2d9c2062 common : add a WARNING for HF cache migration (#20935) b8502 Adrien Gallouët 2026-03-24 09:24:39 +01:00
  • 92080b4396 metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (#20930) b8501 nuri 2026-03-24 17:13:07 +09:00
  • 342d6125bc metal : add FA instantiations for HSK=512, HSV=512 (#20902) b8500 Georgi Gerganov 2026-03-24 10:03:09 +02:00
  • c2e224d829 issues: add openvino backends (#20932) Aaron Teo 2026-03-24 14:41:10 +08:00
  • 8c7957ca33 common : add standard Hugging Face cache support (#20775) b8498 Adrien Gallouët 2026-03-24 07:30:33 +01:00
  • e852eb4901 llama-fit: fix regex pattern for gate_up tensors (#20910) b8497 Aman Gupta 2026-03-24 12:57:57 +08:00
  • 312d870a89 common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) b8496 Aldehir Rojas 2026-03-23 22:21:47 -05:00
  • 7cadbfce10 hexagon: general DMA and Binary Op fixes for large strides (#20918) b8495 Max Krasnyansky 2026-03-23 15:33:49 -07:00
  • 1fb2290a51 Add codeowners for scripts/snapdragon and docs/snapdragon (#20915) Max Krasnyansky 2026-03-23 14:57:18 -07:00
  • 1772701f99 opencl: add q6_K gemm and gemv kernels for Adreno (#20089) b8493 lhez 2026-03-23 12:44:18 -07:00
  • 39bf0d3c6a rpc : RCE patch (#20908) b8492 las7 2026-03-23 10:54:57 -07:00
  • bd6992180b contrib: add "Requirements" section to PR template (#20841) Xuan-Son Nguyen 2026-03-23 16:59:02 +01:00
  • fd18364755 devops: upgraded default oneAPI version (#20731) Davi Henrique Linhares 2026-03-23 10:47:34 -03:00
  • 11fb11b901 webui: Improve chat form positioning (#20901) Aleksander Grygier 2026-03-23 14:30:55 +01:00
  • 35b662bb5d docs: Fix typo in reasoning flag documentation (#20780) Geo Maciolek 2026-03-23 09:24:55 -04:00
  • f93c09e267 memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta() (#20887) b8487 Georgi Gerganov 2026-03-23 14:08:46 +02:00
  • 841bc203e2 docs : rerun llama-gen-docs to include new CLI args (#20892) Eric Zhang 2026-03-23 19:33:38 +08:00
  • 31a5cf4c3f server: use httplib dynamic threads (#20817) b8485 Xuan-Son Nguyen 2026-03-23 12:22:46 +01:00
  • e32d243849 ai : update gh permissions (#20895) Georgi Gerganov 2026-03-23 13:21:41 +02:00
  • c44a932cf4 webui: fix --webui-config-file settings not applied on load (#20823) Pascal 2026-03-23 11:25:35 +01:00
  • 177c75852a metal: add CONV_3D (#19927) Rashid Ul Islam 2026-03-23 13:15:34 +05:30
  • 7a0b6a635e common/autoparser : detect reasoning markers when enable_thinking changes system prompt (#20859) Jhen-Jie Hong 2026-03-23 15:35:27 +08:00
  • 07ff000551 CANN: add RoPE cache preload before ACL graph capture (#20747) b8480 Chenguang Li 2026-03-23 15:24:06 +08:00
  • cc18f965b6 fix(openvino): explicit memset in buffer_context allocation (#20857) b8479 Dan Hoffman 2026-03-22 23:05:37 -07:00
  • 84ffd0c192 opencl: add flattened Q4_K mv and general Q4_K mm (#20773) b8478 shaofeiqi 2026-03-22 22:45:11 -07:00
  • ec2b787ebe mtmd: Add dynamic high-resolution image preprocessing for InternVL model (#20847) b8477 bssrdf 2026-03-22 20:06:30 -04:00
  • d3ac030a5d mtmd : fix LightOnOCR image preprocessing (#20877) b8476 DorianRudolph 2026-03-23 01:04:14 +01:00
  • 49bfddeca1 server: allow router to report child instances sleep status (#20849) b8475 Xuan-Son Nguyen 2026-03-22 18:33:52 +01:00
  • bd3f1d9d65 CUDA: fix BF16 FA compilation (#20865) b8474 Johannes Gäßler 2026-03-22 17:53:33 +01:00
  • 23c9182ce8 jinja : refactor token advancement (#20864) b8473 Sigbjørn Skjæret 2026-03-22 17:45:10 +01:00
  • 81bc4d3ddc server: fix Host header (#20843) b8472 Evgeny Kurnevsky 2026-03-22 15:29:22 +01:00
  • f40a80b4f3 support bf16 and quantized type (#20803) b8471 Neo Zhang 2026-03-22 22:06:27 +08:00
  • db9d8aa428 ggml-cuda: native bf16 flash attention for vec kernel (#20525) b8470 Patrick Buckley 2026-03-22 03:05:51 -07:00
  • ccb87fa3ee [CUDA] Increase number of output elements per-thread block if the K-dimension is small (#20635) b8469 Gaurav Garg 2026-03-22 14:19:35 +05:30
  • 3306dbaef7 misc : prefer ggml-org models in docs and examples (#20827) b8468 ddh0 2026-03-21 16:00:26 -05:00
  • 990e4d9698 common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) b8467 Andrea Arcangeli 2026-03-21 13:43:35 -04:00
  • 212f4521b0 context : use n_embd_out for pooled embedding extraction (#20840) b8466 Tom Hillbrunner 2026-03-21 18:35:00 +01:00
  • 568aec82d2 docs : explicit about banning accounts that violates policy (#19593) Xuan-Son Nguyen 2026-03-21 15:50:16 +01:00
  • 2bcdddd5e3 fix(rpc): prevent division by zero in deserialize_tensor (#20712) b8464 y198 2026-03-21 20:59:43 +07:00
  • eac9c6ea83 Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead of BF16 (#20730) Michael Wand 2026-03-21 04:35:21 -07:00
  • 29b28a9824 ci : switch from pyright to ty (#20826) Sigbjørn Skjæret 2026-03-21 08:54:34 +01:00
  • cea560f483 Add shader count for Intel Arc Pro B60 (#20818) b8461 Matt Corallo 2026-03-21 04:22:51 +00:00
  • b1c70e2e54 common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825) b8460 Piotr Wilkin (ilintar) 2026-03-21 00:19:04 +01:00
  • e6ec21e62f ggml-cpu: add always_inline to tinyBLAS_PPC accumulator saves (#20791) b8459 shalinib-ibm 2026-03-21 04:41:45 +05:30
  • 4cb7e0bd61 ai : limit runtime of the agent (#20816) Georgi Gerganov 2026-03-20 20:31:25 +02:00
  • 203eec25c0 releases : disable s390x builds gg/release-disable-s390x Georgi Gerganov 2026-03-20 19:31:25 +02:00
  • 149b2493c0 common : fix typo in debug log ('extracft' -> 'extract') (#20807) b8457 James O'Leary 2026-03-20 10:23:18 -07:00