Commit Graph

  • e7b4848151 add need_embd in speculative Aman Gupta 2026-05-13 15:09:00 +08:00
  • 19dd00b0e4 remove unused llama_arch Aman Gupta 2026-05-13 15:00:30 +08:00
  • f2200a3a77 mtp -> draft-mtp Aman Gupta 2026-05-13 14:43:12 +08:00
  • 3c3aebaaa0 MTP: clean-up (#9) Aman Gupta 2026-05-13 11:12:20 +08:00
  • 5c58cc5bdd cont : simplify (#7) Georgi Gerganov 2026-05-11 17:26:03 +03:00
  • 180f8ff346 rename files Aman Gupta 2026-05-11 17:09:05 +08:00
  • edcaf120d9 fix batch size Aman Gupta 2026-05-11 12:22:37 +08:00
  • a55493bbda spec: support MTP Aman Gupta 2026-05-11 11:18:17 +08:00
  • 634275fbbb spec : update CLI arguments for better consistency (#22964) b9131 Georgi Gerganov 2026-05-13 09:15:39 +03:00
  • bcfe63fc53 llama-eval : enable type check (#22988) Sigbjørn Skjæret 2026-05-13 08:14:24 +02:00
  • 61af07c22d ggml-zendnn : adaptive fallback to CPU backend for small batch sizes (#22681) b9129 Sachin Sharma 2026-05-13 11:43:47 +05:30
  • 856c3adac1 hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993) b9128 Trivikram Reddy 2026-05-12 19:28:02 -05:00
  • a9883db8ee opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755) b9127 yzyyzyhhh 2026-05-13 04:10:37 +08:00
  • cce09f0b2b convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981) fredzillman 2026-05-12 17:16:01 -02:30
  • dded58b450 webui: Fix Chat Screen Form box disappearing + autoscroll issues on WebKit (#22977) Aleksander Grygier 2026-05-12 20:41:11 +02:00
  • 7bfe120c21 mtmd, server, common: expose modalities to /v1/models (#22952) b9124 Xuan-Son Nguyen 2026-05-12 19:08:07 +02:00
  • 927dada6c9 ggml-webgpu: Enables running gpt-oss-20b (#22906) b9123 Masashi Yoshimura 2026-05-12 23:27:40 +09:00
  • 239a497e5f ggml-webgpu: address precision issues for multimodal (#22808) b9122 Chen Yuan 2026-05-12 10:27:04 -04:00
  • 89730c8d26 model-conversion : add causal-convert-mmproj target [no ci] (#22969) Daniel Bevenius 2026-05-12 15:15:40 +02:00
  • fde69a3607 examples : add llama-eval (#21152) Georgi Gerganov 2026-05-12 15:07:00 +03:00
  • ef93e98d01 vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer (#22461) b9119 Masato Nakasaka 2026-05-12 03:15:34 -07:00
  • 706fbd8ab6 vulkan: Check shared memory size for mmq shaders (#22693) b9118 Jeff Bolz 2026-05-12 04:41:58 -05:00
  • fa62042af9 ci : bump ty to 0.0.35 (#22961) Sigbjørn Skjæret 2026-05-12 11:34:10 +02:00
  • 4178259130 mtmd: add MiMo v2.5 vision (#22883) b9116 AesSedai 2026-05-12 02:11:14 -07:00
  • 78fbbc2c07 convert : add split() to LoraTorchTensor in LoRA converter (#22832) b9115 Jesus Talavera 2026-05-12 07:17:04 +02:00
  • da44953329 metal : promote mul_mv/mul_mm batch divisors to function constants (#22711) b9114 guyfischman 2026-05-12 07:15:02 +02:00
  • 1ec7ba0c14 opencl: add q4_1 MoE for Adreno (#22856) b9113 Shawn Gu 2026-05-11 11:57:26 -07:00
  • 8e1f9d0834 CUDA: handle OW > 65535 in im2col (2D and 3D) (#22944) b9112 CrispStrobe 2026-05-11 19:48:29 +02:00
  • e936660760 Ggml/cuda snake fusion hardening (#22912) Pascal 2026-05-11 18:42:08 +02:00
  • ef22b3e4ac docs: fix metrics endpoint description in server README (#22879) b9110 willjoha 2026-05-11 18:32:26 +02:00
  • 68e7ea3eab spec : parallel drafting support (#22838) b9109 Georgi Gerganov 2026-05-11 19:09:43 +03:00
  • 928b486b0c ggml-virtgpu: Add a GHA build check (#22943) Kevin Pouget 2026-05-11 15:38:22 +02:00
  • 7dbb0e998a examples : update args speculative-simple README.md [no ci] (#22938) Daniel Bevenius 2026-05-11 13:00:57 +02:00
  • dd9280a664 vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (#22589) b9106 Jeff Bolz 2026-05-11 05:49:03 -05:00
  • 8cef8201a1 CUDA: directly include cuda/iterator (#22936) b9105 Oliver Simons 2026-05-11 12:16:38 +02:00
  • f5636f8fc7 convert : add image break token fallback (#22914) Daniel Bevenius 2026-05-11 12:07:17 +02:00
  • c8f8e2364c cont : simplify gg/spec-mtp-experiments Georgi Gerganov 2026-05-11 09:41:00 +03:00
  • 838374375c vendor : update cpp-httplib to 0.44.0 (#22919) b9103 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-11 03:47:13 -03:00
  • 7d442abf5c [SYCL] Add OP im2col_3d (#22903) b9102 Neo Zhang 2026-05-11 13:01:47 +08:00
  • c417ddfc74 fix batch size Aman Gupta 2026-05-11 12:22:37 +08:00
  • a428b010ab spec: support MTP Aman Gupta 2026-05-11 11:18:17 +08:00
  • 389ff61d77 server : print warning when HTTP timeout exceeded (#22907) b9101 Georgi Gerganov 2026-05-10 22:00:18 +03:00
  • 2e97c5f96f backend sampling: support returning post-sampling probs (#22622) b9100 Tim Neumann 2026-05-10 19:12:02 +02:00
  • 5d5d2e15d2 vendor : update cpp-httplib to 0.43.4 (#22888) b9099 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-10 13:46:54 -03:00
  • 2b2babd124 ggml-virtgpu : include missing mutex header (#22810) Oliver Walsh 2026-05-10 16:32:41 +01:00
  • 0b047287fe sync : ggml b9097 Georgi Gerganov 2026-05-10 16:59:29 +03:00
  • efbada936f ggml : bump version to 0.11.1 (ggml/1484) Georgi Gerganov 2026-05-10 16:57:19 +03:00
  • f3c3e0e9a0 internal AllReduce kernel for CUDA provider (#22299) b9095 scutler-nv 2026-05-10 02:05:22 -07:00
  • 5755a100cd model : fix model type check for granite/llama3 and deepseek2/glm4.7 lite (#22870) b9094 Sigbjørn Skjæret 2026-05-10 08:44:29 +02:00
  • 1e5ad35d56 model : add sarvam_moe architecture support (#20275) b9093 Sumit Chatterjee 2026-05-10 00:31:50 +10:00
  • 65d7a8bbf0 devops : updated Nix systems (#22869) Yuannan 2026-05-09 14:15:03 +00:00
  • db8e326913 spec : introduce common_speculative_process() Georgi Gerganov 2026-05-09 17:12:24 +03:00
  • 0d5dd61d66 spec : reset drafting flag at the end Georgi Gerganov 2026-05-09 17:12:06 +03:00
  • ec8bc44854 cont : minor Georgi Gerganov 2026-05-09 15:28:29 +03:00
  • b3bd3bd4cc cont : clean-up Georgi Gerganov 2026-05-09 14:09:45 +03:00
  • 00d56b11c3 docker : upgraded the default intel compute-runtime version (#22567) Davi Henrique Linhares 2026-05-09 05:22:23 -03:00
  • 5757c4dcb1 cmake : update BoringSSL to 0.20260508.0 (#22839) b9090 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-09 04:26:33 -03:00
  • ce0acf03ea server, spec : clean-up Georgi Gerganov 2026-05-09 10:21:57 +03:00
  • e20b83930c SYCL: reduce allocation overhead during flash attention (#22732) b9089 Alexey Kopytko 2026-05-09 15:30:39 +09:00
  • fd89556567 [SYCL] Add BF16 support to GET_ROWS operation (#21391) b9088 Devedse 2026-05-09 07:50:24 +02:00
  • 60489932ec sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (#22152) b9087 Intel AI Get-to Market Customer Success and Solutions 2026-05-08 22:48:07 -07:00
  • 4a4f819cb6 sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations (#22147) Intel AI Get-to Market Customer Success and Solutions 2026-05-08 22:42:40 -07:00
  • 046e284437 Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812) b9085 AesSedai 2026-05-08 20:28:29 -07:00
  • 66001722aa hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837) b9084 Yanzhao Wang 2026-05-08 17:12:04 -07:00
  • c5703e03a5 sycl: support non-contiguous input in PAD op (#22148) Intel AI Get-to Market Customer Success and Solutions 2026-05-08 17:05:22 -07:00
  • b46812de78 Feature hexagon l2 norm (#22816) b9082 Pranav Dhinakar 2026-05-08 13:41:40 -07:00
  • 49956041ee common : do not wrap raw strings in schema parser for tagged parsers (#22827) b9081 Aldehir Rojas 2026-05-08 15:33:17 -05:00
  • 9f5f0e689c model : support Gemma4_26B_A4B_NVFP4 (#22804) b9080 ynankani 2026-05-08 18:42:09 +00:00
  • 55b62bce15 llama : reuse device buffers when possible Georgi Gerganov 2026-05-08 20:42:56 +03:00
  • f1652197dd server : support parallel drafting Georgi Gerganov 2026-05-08 19:30:31 +03:00
  • f88c942861 spec : support parallel drafts Georgi Gerganov 2026-05-08 18:53:33 +03:00
  • f9cd456ea5 common : revert reasoning budget +inf logit bias (#22740) b9079 Aldehir Rojas 2026-05-08 10:46:43 -05:00
  • 927d6635d3 cont : prepare params Georgi Gerganov 2026-05-08 17:50:20 +03:00
  • 5d6f18a638 webui: fix LLM title generation for agentic conversations (#22840) smugman-dot 2026-05-08 15:36:04 +01:00
  • 8822c122be cont : prepare params Georgi Gerganov 2026-05-08 16:59:48 +03:00
  • 29debb3a6a server: support Vertex AI compatible API (#22545) b9077 Xuan-Son Nguyen 2026-05-08 15:23:04 +02:00
  • 6582523eaa spec : refactor for multi-sequence speculative context Georgi Gerganov 2026-05-08 15:43:36 +03:00
  • 9dcf835528 server: (router) expose child model info from router's /v1/models (#22683) b9076 Xuan-Son Nguyen 2026-05-08 14:42:15 +02:00
  • 58e68df0f9 cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667) b9075 Pascal 2026-05-08 11:44:09 +02:00
  • 9b2925e1e0 webui: Add Import/Export of Settings configuration + improve architecture (#22803) Aleksander Grygier 2026-05-08 11:26:04 +02:00
  • efa2f8e5a7 naming : improve consistency gg/spec-refactor-ctx Georgi Gerganov 2026-05-08 12:24:57 +03:00
  • 778f9e247e tools : update readme Georgi Gerganov 2026-05-08 11:55:16 +03:00
  • 1dbc054da5 server : fix slot ctx_drft ptr Georgi Gerganov 2026-05-08 11:55:05 +03:00
  • 161eae0adf spec : fix n_past type Georgi Gerganov 2026-05-08 11:54:32 +03:00
  • a8fd165fec CUDA: lower-case PCI bus id, standardize for ggml (#22820) b9073 Johannes Gäßler 2026-05-08 10:09:38 +02:00
  • e5b1401318 speculative-simple : update Georgi Gerganov 2026-05-08 11:09:34 +03:00
  • 6d57a49a70 vulkan: fix spv shadowing (#22760) b9072 miyan 2026-05-08 15:35:22 +08:00
  • 3b1a8df8fd server : clean-up + dry Georgi Gerganov 2026-05-08 10:20:01 +03:00
  • 233d1aee69 server : add comment Georgi Gerganov 2026-05-08 08:50:23 +03:00
  • 3e941b813b ggml: update SCHED_DEBUG output to use ggml_op_desc() (#22825) b9071 Max Krasnyansky 2026-05-07 22:43:04 -07:00
  • 12c7cfbe83 server : fix URL for draft model Georgi Gerganov 2026-05-08 08:03:49 +03:00
  • 6a4b05a030 server : fix mtmd draft processing Georgi Gerganov 2026-05-08 08:02:11 +03:00
  • f3e8d149ce opencl: add q4_0 MoE GEMM for Adreno (#22731) b9070 Shawn Gu 2026-05-07 21:17:07 -07:00
  • 8be14e40de spec : handle draft running out of context Georgi Gerganov 2026-05-08 07:11:51 +03:00
  • 1d72d87349 convert : fix RuntimeError when stripping FP8 KV-cache scales (#22818) Michał Piszczek 2026-05-08 05:55:48 +02:00
  • 6a2a2513dc fix script error (#22795sycl : ) Neo Zhang 2026-05-08 11:54:57 +08:00
  • ba72d4d287 ggml: update SCHED_DEBUG output to use ggml_op_desc() maxk/sched-debug-use-op-desc Max Krasnyansky 2026-05-06 11:27:12 -07:00
  • 44dbe8c521 model: Support sarashina2.2-vision-3b model (#22103) samuraieng 2026-05-08 06:10:29 +09:00
  • 05ff59cb57 CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (#22651) b9066 leonardHONG 2026-05-08 03:59:29 +08:00
  • aaf4a4d5e0 webui: add option for LLM title generation (#22265) smugman-dot 2026-05-07 20:14:03 +01:00