Commit Graph

  • 03b3d07798 Convert: Fix NemotronH Config Parsing (#21664) Anav Prasad 2026-04-16 10:11:45 +00:00
  • 3f7c29d318 ggml: add graph_reused (#21764) b8816 Aman Gupta 2026-04-16 17:21:28 +08:00
  • ae2d34899e metal: Implement ROLL op (#21946) b8815 Kusha Gharahi 2026-04-16 03:54:37 -05:00
  • 1e796eb41f ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot (#20633) b8814 rehan-10xengineer 2026-04-16 13:15:15 +05:00
  • 5637536517 ggml : implemented simd_gemm kernel for riscv vector extension (#20627) b8813 rehan-10xengineer 2026-04-16 13:14:26 +05:00
  • 90fb96a7b3 devops : added spirv-headers to nix (#21965) Yuannan 2026-04-16 08:12:52 +00:00
  • 82677a6ede ggml-webgpu: compute pass batching and removing profiling overhead (#21873) b8811 Reese Levine 2026-04-16 01:12:19 -07:00
  • 8612ed18b7 ci : Use ggml-org/ccache-action on RISC-V as well (#21632) Ludovic Henry 2026-04-16 10:11:25 +02:00
  • b1be68e8ca [SYCL] Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (#21638) b8809 Katostrofik 2026-04-16 01:34:05 -04:00
  • 408225bb1a server: use random media marker (#21962) b8808 Xuan-Son Nguyen 2026-04-15 23:52:22 +02:00
  • b3d758750a vulkan: optimize im2col (#21713) b8807 Ruben Ortlam 2026-04-15 19:04:51 +02:00
  • 7e72b38bc1 cuda: Q1_0 initial backend (#21629) b8806 Pasha Khosravi 2026-04-15 09:38:38 -07:00
  • 20d3bc2cc8 ggml-webgpu: Fix dequantization helpers to not pass in pointers (#21872) Reese Levine 2026-04-15 09:14:40 -07:00
  • a6206958d2 CUDA: require explicit opt-in for P2P access (#21910) b8804 Johannes Gäßler 2026-04-15 16:01:46 +02:00
  • 014dca49d6 CUDA: manage NCCL communicators in context (#21891) Johannes Gäßler 2026-04-15 15:58:40 +02:00
  • adb541a6ad rpc : add native RDMA transport for RPC backend (RoCEv2) (#20590) b8802 Valeriy Dubov 2026-04-15 16:44:02 +03:00
  • 80d8770804 docs: more extensive RoPE documentation [no ci] (#21953) Xuan-Son Nguyen 2026-04-15 14:45:16 +02:00
  • 4943e3a396 gen-libllama-abi: compile sort-key regex once outside the lambda copilot/plan-semantic-versioning-libllama copilot-swe-agent[bot] 2026-04-15 12:04:44 +00:00
  • 51b679a5d6 semver: revert llama_export.h, fix ABI baseline to track full signatures copilot-swe-agent[bot] 2026-04-15 12:02:36 +00:00
  • c00ac13fee libllama-abi-check: add explicit read-only permissions to workflow job copilot-swe-agent[bot] 2026-04-15 11:45:14 +00:00
  • 3f3d62ffec semver: add proper semantic versioning and ABI check workflow for libllama copilot-swe-agent[bot] 2026-04-15 11:44:00 +00:00
  • 8dc530b86d ci: disable test-backend-ops on Vulkan llvmpipe run and resture default timeout (#21901) Ruben Ortlam 2026-04-15 10:55:21 +02:00
  • e1a9a6dcbe autoparser: support case of JSON_NATIVE with per-call markers (test case: Reka-Edge) (#21892) b8799 Piotr Wilkin (ilintar) 2026-04-15 10:51:50 +02:00
  • e39eba26f3 read n_ctx back after making llama_context (#21939) b8798 Matt 2026-04-15 00:24:57 -07:00
  • 5d14e5d19b hexagon: optimization for HMX mat_mul (#21554) b8797 Yiwei Shao 2026-04-14 14:09:03 -07:00
  • fae3a28070 ggml : remove ggml-ext.h (#21869) b8796 Xuan-Son Nguyen 2026-04-14 16:32:58 +02:00
  • c0de6eda72 metal : fix FA support logic (#21898) b8795 Georgi Gerganov 2026-04-14 17:32:29 +03:00
  • 707c0b7a6e mtmd: add mtmd_image_tokens_get_decoder_pos() API (#21851) b8794 Xuan-Son Nguyen 2026-04-14 16:07:41 +02:00
  • 1f30ac0cea vulkan: Programmatically add RoundingModeRTE to all shaders when the device supports it (#21572) b8793 Jeff Bolz 2026-04-14 15:17:45 +02:00
  • f4b5bf2f32 ci : re-enable mac workflows (#21894) b8792 Georgi Gerganov 2026-04-14 15:58:09 +03:00
  • aa0f1897b7 metal : add XIELU unary op (#20802) b8791 Seyoung Jeong 2026-04-14 21:43:59 +09:00
  • be76dd0bb2 vendor : update BoringSSL to 0.20260413.0 (#21881) b8790 Adrien Gallouët 2026-04-14 13:25:09 +02:00
  • 2e05f06ffb ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (#21559) b8789 Richard Davison 2026-04-14 13:23:45 +02:00
  • acc37a42ea cmake: fix CMP0194 warning on Windows with MSVC (#21630) b8788 texasich 2026-04-14 05:47:56 -05:00
  • 5a23695d5a ggml-webgpu: Update register tiling matmul to use f32 accumulation (#21644) b8787 Reese Levine 2026-04-14 03:46:41 -07:00
  • 56666fa607 common: skip reasoning budget sampler when no budget is requested (#21870) b8786 Berk Idem 2026-04-14 06:43:06 -04:00
  • 6a6780a232 vulkan: Support GGML_TYPE_NVFP4 (#21455) b8785 Jeff Bolz 2026-04-14 11:34:23 +02:00
  • e489a5ca0e server: support OAI /v1/audio/transcriptions API (#21863) b8784 Xuan-Son Nguyen 2026-04-14 11:09:52 +02:00
  • 53fb592060 opt arc770 for Q4_0 arthw 2026-04-14 12:22:19 +08:00
  • e21cdc11a0 common/gemma4 : handle parsing edge cases (#21760) b8783 Aldehir Rojas 2026-04-13 18:18:18 -05:00
  • e974923698 docs: listing qwen3-asr and qwen3-omni as supported (#21857) Xuan-Son Nguyen 2026-04-13 22:28:17 +02:00
  • 1c0d9081fd chat: dedicated DeepSeek v3.2 parser + "official" template (#21785) b8781 Piotr Wilkin (ilintar) 2026-04-13 22:23:53 +02:00
  • a8bad3842e ci: Also exempt 'security' tag from auto-close (#21844) Christian Kastner 2026-04-13 19:18:44 +02:00
  • 75f3bc94e6 vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) b8779 Ruben Ortlam 2026-04-13 14:21:31 +02:00
  • aa00911d12 common : add download cancellation and temp file cleanup (#21813) b8778 Adrien Gallouët 2026-04-13 11:18:23 +02:00
  • ce8fd4b1a6 server: Expose build_info in router mode (#21835) b8777 Gaspard Petit 2026-04-13 05:14:42 -04:00
  • 9f5e1edb10 CUDA: Limit DeviceSegmentedSort to immediate mode (#21718) b8776 Oliver Simons 2026-04-13 11:14:06 +02:00
  • 920b3e78cb mtmd: use causal attn for gemma 4 audio (#21824) b8775 Xuan-Son Nguyen 2026-04-13 09:47:55 +02:00
  • 974c8c94cc webui: add setting for first-line chat titles (#21797) Rohan Jain 2026-04-13 13:00:46 +05:30
  • 227ed28e12 webui: MCP Diagnostics improvements (#21803) Aleksander Grygier 2026-04-13 07:58:38 +02:00
  • bafae27654 Remove extra conditional check on debug mode. (#21798) b8772 Masashi Yoshimura 2026-04-13 12:13:04 +09:00
  • 873c825611 sycl: disable Q1_0 in backend and cleanup unused variables (#21807) b8771 Akarshan Biswas 2026-04-13 07:14:58 +05:30
  • 82764d8f40 mtmd: fix crash when sending image under 2x2 pixels (#21711) b8770 Sergiu 2026-04-13 00:59:21 +03:00
  • 21a4933042 mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441) b8769 Xuan-Son Nguyen 2026-04-12 23:57:25 +02:00
  • 1e9d771e2c convert : force f16 or f32 on step3-vl conv weights (#21646) Sigbjørn Skjæret 2026-04-12 19:22:29 +02:00
  • aa4695c5e5 mtmd: add gemma 4 test (vision + audio) [no ci] (#21806) Xuan-Son Nguyen 2026-04-12 16:29:03 +02:00
  • 547765a93e mtmd: add Gemma 4 audio conformer encoder support (#21421) b8766 Stephen Cox 2026-04-13 00:15:26 +12:00
  • 9e209c5aee fix: Proper messages rendering for "Show raw output" (#21672) Aleksander Grygier 2026-04-12 13:08:11 +02:00
  • 6313acbef0 docs: add guide on how to add multimodal support (#21778) Xuan-Son Nguyen 2026-04-12 13:02:38 +02:00
  • ff5ef82786 CUDA: skip compilation of superfluous FA kernels (#21768) b8763 Johannes Gäßler 2026-04-11 18:52:11 +02:00
  • 073bb2c20b mtmd : add MERaLiON-2 multimodal audio support (#21756) b8762 Sirui He 2026-04-11 20:15:48 +08:00
  • af1127d3c4 opencl: add basic support for q5_k (#21593) b8761 shaofeiqi 2026-04-11 01:46:19 -07:00
  • 865ff06b2f TP: fix Qwen 3 Next data split (#21732) b8760 Johannes Gäßler 2026-04-11 09:23:42 +02:00
  • 2b2cd57de6 ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (#21716) b8759 Sigbjørn Skjæret 2026-04-11 08:45:00 +02:00
  • 660386f6f8 py : Bump typer to latest to fix huggingface_hub issue (#21701) Bartowski 2026-04-11 02:44:15 -04:00
  • a29e4c0b7b CUDA: also store node->src ne/nb for graph equality (#21736) b8757 Aman Gupta 2026-04-11 10:30:30 +08:00
  • b136b62cf9 fix: Fix broken structured output when using $refs in json_schema (#21699) b8756 Galunid 2026-04-11 01:26:36 +02:00
  • 81069a808a hexagon: add support for linux on snapdragon (#21707) b8755 Todor Boinovski 2026-04-10 15:57:23 -07:00
  • 9aa2807769 hexagon: improved Op queuing, buffer and cache management (#21705) b8754 Max Krasnyansky 2026-04-10 15:47:43 -07:00
  • 3fc65063d9 common : better align to the updated official gemma4 template (#21704) b8753 Aldehir Rojas 2026-04-10 16:12:53 -05:00
  • 05b3caaa48 common : add callback interface for download progress (#21735) b8752 Adrien Gallouët 2026-04-10 22:17:00 +02:00
  • e62fa13c24 model : make Gemma 4 shared-KV tail attn_k tensors optional on load (#21739) b8751 MoonRide303 2026-04-10 21:45:50 +02:00
  • bfd1f453cb ggml-webgpu: support non-square subgroup matrix configs for Intel GPUs (#21669) b8750 Rithik Sharma 2026-04-10 10:52:38 -07:00
  • e4fed9d08d ggml-webgpu: address quantization precision and backend lifecycle managment (#21521) b8749 Chen Yuan 2026-04-10 13:52:01 -04:00
  • 5dd102539b server : ignore --alias when using --models-preset (#21380) b8748 Adrien Gallouët 2026-04-10 17:42:56 +02:00
  • fb38d6f278 common : fix when loading a cached HF models with unavailable API (#21670) b8747 Adrien Gallouët 2026-04-10 16:37:46 +02:00
  • 0893f50f2d common: mark --split-mode tensor as experimental (#21684) b8746 Johannes Gäßler 2026-04-10 12:27:27 +02:00
  • f989a6e39e webui: Static build output improvements (#21667) Aleksander Grygier 2026-04-10 11:49:47 +02:00
  • d7ff074c87 common : enable reasoning budget sampler for gemma4 (#21697) b8744 Berk Idem 2026-04-10 05:49:14 -04:00
  • 3f8752b559 docs : fix broken link to ggml-openvino in OPENVINO.md (#21709) Belem Zhang 2026-04-10 15:50:08 +08:00
  • 7b69125331 vulkan: Support Q1_0 (#21539) b8742 Jeff Bolz 2026-04-10 01:35:27 -05:00
  • e095a482a0 common : add fluidity to the progress bar (#21671) b8741 Adrien Gallouët 2026-04-10 08:24:53 +02:00
  • e34f042154 CUDA: fuse muls (#21665) b8740 Aman Gupta 2026-04-10 10:24:09 +08:00
  • d132f22fc9 HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570) b8739 andyluo7 2026-04-09 22:13:32 +03:00
  • d6f3030047 ggml: backend-agnostic tensor parallelism (experimental) (#19378) b8738 Johannes Gäßler 2026-04-09 16:42:19 +02:00
  • 009a113326 ggml : check return value of CUB calls used in argsort and top-k (they all return cudaError_t) (#21676) b8737 fairydreaming 2026-04-09 15:17:11 +02:00
  • 4cabbe36e0 state 0cc4m/vulkan-async-p2p Ruben Ortlam 2026-04-09 13:00:31 +02:00
  • 9f001cae27 state Ruben Ortlam 2026-04-09 12:51:43 +02:00
  • 88335c0490 state Ruben Ortlam 2026-04-09 12:39:51 +02:00
  • c8ac02fa1b requirements : update transformers to 5.5.1 (#21617) Daniel Bevenius 2026-04-09 12:36:29 +02:00
  • 204023c897 state Ruben Ortlam 2026-04-09 12:36:15 +02:00
  • d88d722fc1 state Ruben Ortlam 2026-04-09 12:32:08 +02:00
  • 4ef9301e4d webui: add "Send message on Enter" setting (#21577) JvM 2026-04-09 12:26:27 +02:00
  • 96d9516329 state Ruben Ortlam 2026-04-09 12:25:27 +02:00
  • ddf03c6d9a common : fix ambiguous grammar rule in gemma4 (#21661) b8734 Aldehir Rojas 2026-04-09 05:25:07 -05:00
  • 26229755c5 common : simplify autoparser tagged parser rules (#21216) b8733 Aldehir Rojas 2026-04-09 05:24:20 -05:00
  • 057dba336e model: fix multimodal padding token for gemma3n/gemma4 (#21625) b8732 Xuan-Son Nguyen 2026-04-09 12:18:23 +02:00
  • 501aeed18f mtmd: support dots.ocr (#17575) b8731 Xuan-Son Nguyen 2026-04-09 12:16:38 +02:00
  • 8a108eddb4 state Ruben Ortlam 2026-04-09 12:05:15 +02:00
  • 47dde34e00 state Ruben Ortlam 2026-04-09 11:58:46 +02:00