Commit Graph

  • 49c21f97cd llama: initialize pre-norm embedding mask flag (#23256) b9213 Andrei 2026-05-18 04:20:49 -07:00
  • 77e38d68f2 add myself to conversion (#23261) Sigbjørn Skjæret 2026-05-18 12:42:56 +02:00
  • 053e01dff6 ci : added kleidiai-server to server-self-hosted workflow (#22435) Martin Klacer 2026-05-18 10:14:57 +01:00
  • c3f95c1f06 scripts : allow wc2wt with an existing branch (#23189) Georgi Gerganov 2026-05-18 08:57:28 +03:00
  • 0caf2a1d48 sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156) b9209 Intel AI Get-to Market Customer Success and Solutions 2026-05-17 22:12:21 -07:00
  • 5511965b19 sycl: route small f32 matmuls to oneMKL, bypass oneDNN (#22150) b9208 Intel AI Get-to Market Customer Success and Solutions 2026-05-17 22:11:51 -07:00
  • e98bcfec28 sycl : fix error when use -mg 1 error (#23140) Neo Zhang 2026-05-18 13:11:19 +08:00
  • 1867a0c692 update bid to match each layers MTP source (#23237) Incarnas 2026-05-17 21:37:12 -07:00
  • dd7cad7197 cmake : do not check for bin install dir (#23234) Sigbjørn Skjæret 2026-05-18 02:33:14 +02:00
  • 726704a160 feat: Support d_conv=15 for ssm-conv.cu (#23017) b9204 Gabe Goodhart 2026-05-17 15:05:11 -06:00
  • 87589042ca cmake : fix LLAMA_BUILD_UI logic (#23190) b9203 Aldehir Rojas 2026-05-17 14:42:26 -04:00
  • e0de4c2419 cmake : do not install conversion script (#23204) b9202 Sigbjørn Skjæret 2026-05-17 18:07:21 +02:00
  • 84c678242a CUDA: Continue directly including cuda/iterator (#23102) Oliver Simons 2026-05-17 18:00:10 +02:00
  • 3e12fbdea5 llama: avoid copying logits during prompt decode in MTP (#23198) b9200 Aman Gupta 2026-05-17 23:30:25 +08:00
  • 39cf5d6191 common : delegate assistant continuation to underlying template handlers (#23089) Aldehir Rojas 2026-05-17 07:36:05 -04:00
  • a6d6183dbc ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (#22009) b9198 Jan Ekström 2026-05-17 14:12:11 +03:00
  • fcae601e44 vulkan: add cpy bf16 -> f32 pipelines (#22677) b9197 Pascal 2026-05-17 11:31:20 +02:00
  • 7ba22c6a09 vulkan: Support unaligned tensors for ROPE (#22637) b9196 Jeff Bolz 2026-05-17 04:30:16 -05:00
  • f4cc787b9f common : enable streaming JSON argument values (#23173) Aldehir Rojas 2026-05-17 04:44:34 -04:00
  • 3fbadb06dc vulkan: fuse SSM_CONV + BIAS + SILU (#22653) b9194 Jeff Bolz 2026-05-17 03:25:50 -05:00
  • 1a68ec9378 server : honor --embd-normalize CLI arg (#23125) b9193 Rares Vernica 2026-05-16 23:39:04 -07:00
  • a16cce81d3 ngram : reduce noisy logs (#23185) b9192 ddh0 2026-05-17 01:38:17 -05:00
  • 4f13cb7424 webui: support video files as input (#22830) b9191 Judd 2026-05-17 08:13:44 +08:00
  • b64739ea39 server: (router) alloc tmp buffer on heap (#23159) b9190 Xuan-Son Nguyen 2026-05-16 23:42:16 +02:00
  • 64b38b561b server: skip device enumeration in router mode to avoid creating CUDA primary context (#23137) b9189 Pascal 2026-05-16 21:21:06 +02:00
  • 6049906133 vulkan: removed duplicate #include <memory> in headers (#23144) Winston Ma 2026-05-17 01:57:35 +08:00
  • 0253fb21f5 ui: Add request timeout for MCP tool calls (#23138) Aleksander Grygier 2026-05-16 15:20:27 +02:00
  • 3a92bc99db sync : ggml b9186 Georgi Gerganov 2026-05-16 15:59:45 +03:00
  • e6c37a1adc ggml : bump version to 0.12.0 (ggml/1494) Georgi Gerganov 2026-05-16 15:59:09 +03:00
  • 560445bf34 metal : tighten input-position loop in kernel_conv_transpose_1d (ggml/1477) CrispStrobe 2026-05-10 16:45:00 +02:00
  • 2eb3e6b242 ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480) Steve Lhomme 2026-05-10 16:35:38 +02:00
  • 25b1bc9c2f ui: Correct links in tools/ui/README.md [no ci] (#23139) Holger Voormann 2026-05-16 14:42:38 +02:00
  • 18675b6bbc vendor : update cpp-httplib to 0.45.0 (#23103) b9181 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-16 09:25:21 -03:00
  • 255582687b llama + spec: MTP Support (#22673) b9180 Aman Gupta 2026-05-16 20:06:23 +08:00
  • b81c2cdd74 ui: Fix handling of MCP resource template parameters (#23117) kubawoo 2026-05-16 13:25:41 +02:00
  • 1428004808 webui : [ChatFormActionAdd][a11y] fix accessibility issues in add menu trigger and items (#22736) viggy 2026-05-16 03:00:46 -07:00
  • 366c5e2a3b ui: untrack settings sync in props effect to prevent reactive loop (#23127) Pascal 2026-05-16 11:25:34 +02:00
  • 1d9f99aa75 fix: Add build step using build workflow to publish workflow (#23134) Aleksander Grygier 2026-05-16 11:22:59 +02:00
  • 42928bc14d model : NvFP4 quantized LM head support (#23046) ynankani 2026-05-16 09:09:27 +00:00
  • 59778f0196 ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064) b9174 Aleksander Grygier 2026-05-16 02:02:40 +02:00
  • 49d1701bd2 ci : fix release symlinks (#23119) b9173 Sigbjørn Skjæret 2026-05-16 01:09:28 +02:00
  • 1348f67c58 webui: Use lowercase hash for HF checksum check (#23107) b9172 Omer Ozarslan 2026-05-15 10:38:16 -07:00
  • cfabeb1bad tests: add BF16 non-contig coverage for MUL_MAT permutations (#22689) Pascal 2026-05-15 19:35:05 +02:00
  • 6831fe470c docs: document usage object in server timings response (#23110) Julien Chaumond 2026-05-15 19:33:12 +02:00
  • 72e60f500d mtmd: add chunks and fix preproc for qwen3a (#23073) b9169 Xuan-Son Nguyen 2026-05-15 19:32:47 +02:00
  • 8be1786707 webui: fix theme from --webui-config-file not applied on first load (fresh localStorage) (#22902) Pascal 2026-05-15 19:25:38 +02:00
  • 18d1717d62 convert : fix Qwen3 ASR conversion (#23081) Sigbjørn Skjæret 2026-05-15 18:38:39 +02:00
  • 938872e93f fix partial writes 0cc4m/vulkan-repack Ruben Ortlam 2026-05-15 16:00:57 +02:00
  • ff6ad60994 wider loads Ruben Ortlam 2026-05-15 15:22:57 +02:00
  • cc7200bf12 Refactor: convert_hf_to_gguf.py (#17114) Piotr Wilkin (ilintar) 2026-05-15 15:18:12 +02:00
  • 13a55c8e50 deduplicate repacking code Ruben Ortlam 2026-05-15 13:25:49 +02:00
  • 769cc93a43 ci : fix transform of top . entry in release archive (#23080) b9165 Sigbjørn Skjæret 2026-05-15 13:13:16 +02:00
  • 57fb74fba3 add q4_1, q8_0, iq4_nl repacking Ruben Ortlam 2026-05-15 13:10:19 +02:00
  • d5dc2e0a02 llama-eval : add AIME 2026 dataset support (#23058) Georgi Gerganov 2026-05-15 13:58:30 +03:00
  • 6906f78189 replace malloc/free with thread_local memory Ruben Ortlam 2026-05-15 12:11:01 +02:00
  • b64f294cbf add missing repacking functions Ruben Ortlam 2026-05-15 12:04:08 +02:00
  • ac33f032ac reasoning-budget: clone should do a deep-copy (#23095) b9163 Aman Gupta 2026-05-15 17:59:07 +08:00
  • b4e2621de8 add mxfp4 repacking Ruben Ortlam 2026-05-15 11:58:13 +02:00
  • b1243aa933 fix double semicolon Ruben Ortlam 2026-03-24 13:57:56 +01:00
  • 5c1e95c901 add coopmat2 support Ruben Ortlam 2026-03-24 13:57:40 +01:00
  • c285bb9838 vulkan: repack q4_0 into aligned arrays Ruben Ortlam 2026-03-23 14:57:06 +01:00
  • d528444580 webui: preserve partial response on streaming error (#23090) Pascal 2026-05-15 11:18:11 +02:00
  • 91e84fed64 Support for Codex CLI by skipping unsupported Responses tools (#23041) b9161 Sid Shaytay 2026-05-15 00:03:24 -07:00
  • 7155a49771 readme : update bindings (#23063) KITAITI Makoto 2026-05-15 14:41:24 +09:00
  • 5c0e946837 ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076) b9159 Pranav Dhinakar 2026-05-14 16:55:54 -07:00
  • 3e037f313c HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) b9158 Johannes Gäßler 2026-05-14 22:58:58 +02:00
  • d81e63dcfd CI : support IOT device (IQ9) (#22987) Zack Li 2026-05-14 13:58:34 -07:00
  • 834a243664 ggml-webgpu: Enable NVIDIA self-hosted CI (#22976) b9156 Reese Levine 2026-05-14 09:41:32 -07:00
  • 5ec717d125 ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040) Zheyuan Chen 2026-05-14 09:31:36 -07:00
  • 0c3e4fccca fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051) Aleksander Grygier 2026-05-14 17:57:20 +02:00
  • 97b658cee8 contributing: new contributors should not submit trivial fixes (#23045) Aman Gupta 2026-05-14 23:55:24 +08:00
  • 253ba110bc webui: Move static build output from repo code to HF Bucket (#22937) Aleksander Grygier 2026-05-14 13:21:41 +02:00
  • 67b2b7f2f2 logs : reduce (#23021) b9151 Georgi Gerganov 2026-05-14 13:05:52 +03:00
  • 81b0d882ae ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863) b9150 alex-spacemit 2026-05-14 17:39:30 +08:00
  • 0f45f1a35c docker : revert stable version of intel compute-runtime (#22968) Neo Zhang 2026-05-14 17:30:40 +08:00
  • 42532afff4 unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110) b9148 Kabir Potdar 2026-05-14 09:03:40 +00:00
  • dbe7901ca6 vulkan: fix matmul integer pipeline selection (#23005) Ruben Ortlam 2026-05-14 10:36:54 +02:00
  • 6eb6d84e46 metal: add GDN partial rollback gg/metal-gdn-partial-rollback Georgi Gerganov 2026-05-14 10:24:09 +03:00
  • 8c05923630 vulkan: add GDN partial rollback Aman Gupta 2026-05-14 08:41:23 +02:00
  • 9a3a48722a fix pending state Aman Gupta 2026-05-14 14:32:42 +08:00
  • 320a6a44a5 fix: Autoscroll detection (#23026) Aleksander Grygier 2026-05-14 08:09:29 +02:00
  • 9ed6e19b9d SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations (#21597) b9145 Katostrofik 2026-05-14 01:39:14 -04:00
  • 4e732e0a6c llama: allow partial seq_rm for GDN models for speculative decoding Aman Gupta 2026-04-26 00:42:04 +08:00
  • 4c1c3ac09d ggml-webgpu: only use subgroup-matrix path when head dims are divisible by sg_mat_k / sg_mat_n (#23020) b9144 Zheyuan Chen 2026-05-13 15:12:40 -07:00
  • 7f3f843c31 Fix for issue #22974. Cast intermediate results to float before adding and casting the result to the destination type. Avoids half+half operator ambiguity. (#22994) b9143 scutler-nv 2026-05-13 13:36:14 -07:00
  • ec562eb673 opencl: add q5_0 and q5_1 MoE for Adreno (#22985) b9142 shaofeiqi 2026-05-13 11:57:31 -07:00
  • 95d469a915 server, webui: accept continue_final_message flag for vLLM API compat (#23012) b9141 Pascal 2026-05-13 20:47:58 +02:00
  • 1e4579fbb8 opencl: fix crash when warming up MoE on Adreno (#22876) b9140 lhez 2026-05-13 11:24:33 -07:00
  • 527045bfb0 flush the gpu profile timestamp before the queryset is overflowed (#22995) b9139 Masashi Yoshimura 2026-05-14 02:22:44 +09:00
  • 2dfeca31cc webui: Deduplicate model aliases in data + handle single/multiple aliases in UI (#22979) Aleksander Grygier 2026-05-13 16:39:36 +02:00
  • 46be24d121 webui: preserve system message on edit cancel (#22911) Pascal 2026-05-13 16:16:02 +02:00
  • 7e16646015 docs : Update OPENVINO.md (#22959) Ravi Panchumarthy 2026-05-13 07:12:15 -07:00
  • ad96bb8c0c hexagon: add unary tanh op (#22999) Max Krasnyansky 2026-05-13 06:59:28 -07:00
  • e75cd5efb5 download: do not exit() on error (#23008) b9134 Xuan-Son Nguyen 2026-05-13 15:14:58 +02:00
  • 5d44db6008 server, webui: support continue generation on reasoning models (#22727) b9133 Pascal 2026-05-13 11:09:51 +02:00
  • 3796c94bad ci: validate model naming convention (#22680) Xuan-Son Nguyen 2026-05-13 10:59:37 +02:00
  • e7b4848151 add need_embd in speculative Aman Gupta 2026-05-13 15:09:00 +08:00
  • 19dd00b0e4 remove unused llama_arch Aman Gupta 2026-05-13 15:00:30 +08:00
  • f2200a3a77 mtp -> draft-mtp Aman Gupta 2026-05-13 14:43:12 +08:00
  • 3c3aebaaa0 MTP: clean-up (#9) Aman Gupta 2026-05-13 11:12:20 +08:00