Commit Graph

  • fbb7ceff1d fix builds, integrate vulkan profiler, fix copy events, fix export Piotr Wilkin 2026-03-29 16:52:50 +02:00
  • 2895925203 Fix more missing backend stuff (and Python errors) Piotr Wilkin 2026-03-29 01:57:02 +01:00
  • 4e927afd4c add second dimension to reported tensors, fix Mac build, add missing initializer to all backends Piotr Wilkin 2026-03-29 01:49:52 +01:00
  • 893aa72363 feat: cool profiler thingy Piotr Wilkin 2026-03-29 01:14:09 +01:00
  • 31e82494c0 mtmd: support "frame merge" for qwen-vl-based models (#21858) b9543 Xuan-Son Nguyen 2026-06-06 21:17:25 +02:00
  • 37c56c245e wip gg/pr/23398-save Georgi Gerganov 2026-06-06 16:30:41 +03:00
  • 6b80c74f28 completion : remove useless statics (#24226) b9542 Adrien Gallouët 2026-06-06 12:16:16 +02:00
  • 588f0dc2ce completion : fix format specifier in LOG_INF (#24213) b9541 Adrien Gallouët 2026-06-06 11:24:27 +02:00
  • f5c6ae1827 mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API (#23913) Xuan-Son Nguyen 2026-06-06 11:06:51 +02:00
  • 1c4a91c0f3 wip Georgi Gerganov 2026-06-06 10:48:36 +03:00
  • 5a69c97439 vulkan: check coopmat2 features before reporting support (#24186) Ruben Ortlam 2026-06-06 09:11:35 +02:00
  • 5343f4502a model : rename local n_layer_all variable (#24209) b9538 Sigbjørn Skjæret 2026-06-06 06:07:20 +02:00
  • 603300b008 context : fix off-by-one comparisons to n_gpu_layers (#24208) b9537 Sigbjørn Skjæret 2026-06-06 06:06:47 +02:00
  • 308f61c31f opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160) b9536 lhez 2026-06-05 13:45:25 -07:00
  • da87e9b612 common/chat : unify and fix LFM2/LFM2.5 tool parser (#24178) b9535 Tarek Dakhran 2026-06-05 21:31:56 +02:00
  • e82beaa60d vulkan: add fwht support for Intel with shmem reduction (#23964) b9534 Ruben Ortlam 2026-06-05 19:44:40 +02:00
  • c4a278d68e model: fix build failed (#24193) b9533 Xuan-Son Nguyen 2026-06-05 18:12:27 +02:00
  • 64086f2b2f model, mtmd: Granite4 Vision (#23545) Gabe Goodhart 2026-06-05 09:44:59 -06:00
  • 6effcecd0b TP: round up granularity to 128 (#24180) b9531 Johannes Gäßler 2026-06-05 17:35:13 +02:00
  • 86591c7536 cli: fix model params not propagated (#23893) b9530 therealkenc 2026-06-05 08:29:41 -07:00
  • 65eef9549c Merge branch 'master' into pr/23398 Georgi Gerganov 2026-06-05 17:47:19 +03:00
  • 96fbe00393 model : fix llama_model::n_gpu_layers() (#24188) b9529 Georgi Gerganov 2026-06-05 17:11:42 +03:00
  • 2016bf2b3b ui: run npm install when package-lock.json is newer than node_modules (#24171) b9528 Pascal 2026-06-05 14:57:32 +02:00
  • 9c955c48b0 Fix link to available UI settings (#24169) Mario 2026-06-05 13:39:32 +01:00
  • cc7bef34e2 ui: add ignore-scripts=true to npmrc (#24149) Xuan-Son Nguyen 2026-06-05 14:31:03 +02:00
  • f0438b1b15 cont : avoid computations on the CPU Georgi Gerganov 2026-06-05 14:39:03 +03:00
  • d78a3864f0 cont : adjust to hparams changes Georgi Gerganov 2026-06-05 14:38:41 +03:00
  • 5954f196ed Merge branch 'master' into pr/23398 Georgi Gerganov 2026-06-05 14:02:53 +03:00
  • ad1b88ca0d docs: Update quantization readme (#24133) Pedro Cuenca 2026-06-05 12:21:26 +02:00
  • 59917d3922 minor : fix lint issues (#24165) b9524 Georgi Gerganov 2026-06-05 11:17:54 +03:00
  • 7acb4e8cd2 hparams : refactor hparams.n_layer (#24060) b9523 Georgi Gerganov 2026-06-05 11:09:36 +03:00
  • 3ecfb150a4 kleidiai : dynamic chunck-based scheduling for hybrid execution (#23819) b9522 Charles Xu 2026-06-05 09:11:47 +02:00
  • 4eaa3cee66 add unified assistant Aman Gupta 2026-06-05 14:59:44 +08:00
  • 2154a0fdcf CUDA: enroll mul_mat_vec_q_moe into pdl (#24087) b9521 Oliver Simons 2026-06-05 08:37:34 +02:00
  • 46fa662b1f ci : build-msys job slimming [no ci] (#24157) Daniel Bevenius 2026-06-05 07:57:36 +02:00
  • 7fe2ae45ab sycl : port multi-column MMVQ from CUDA backend (#21845) b9519 Mason Milburn 2026-06-05 01:10:31 -04:00
  • 7c158fbb4a server : disable on-device spec checkpoints (#24108) b9518 Georgi Gerganov 2026-06-04 19:30:59 +03:00
  • 260862b8ca arg: fix double mtp downloads (#24128) Xuan-Son Nguyen 2026-06-04 18:23:48 +02:00
  • 42b2d60e57 webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (#23132) viggy 2026-06-04 08:59:00 -07:00
  • e7bcf1c3a8 Move duplicated imatrix code into single common imatrix-loader.cpp (#22445) b9515 Bartowski 2026-06-04 11:45:40 -04:00
  • 21444c822e ui: Fixed packages (#24119) Aleksander Grygier 2026-06-04 16:23:08 +02:00
  • 526977068f ui: added single line reasoning preview (#23601) MagicExists 2026-06-04 21:09:43 +07:00
  • 0dbfa66a1f return filter to save memory (#24125) b9512 forforever73 2026-06-04 21:56:33 +08:00
  • e8023568d0 convert: Fix Gemma 4 Unified conversion (#24118) Pedro Cuenca 2026-06-04 15:21:38 +02:00
  • 4c51309617 ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209) b9510 Kartik Sirohi 2026-06-04 18:42:38 +05:30
  • 6f3a9f3dee server: avoid unnecessary checkpoint restore when new tokens are present (#24110) b9509 Yongyue Sun 2026-06-04 21:09:01 +08:00
  • a121232fdc agents: refactor, include more guidelines (#24111) Xuan-Son Nguyen 2026-06-04 13:40:23 +02:00
  • 4586479852 webui: fix tool selector toggle/counter, key tools by stable identity (#24065) Pascal 2026-06-04 13:09:49 +02:00
  • 4d742877b2 build : use umbrella Headers directory for XCFramework module map (#23974) Gerard Martinez 2026-06-04 03:58:25 -07:00
  • dd97604fc4 move assistant to separate file Aman Gupta 2026-05-28 14:12:23 +08:00
  • c0da00af04 add exception in test-llama-archs Aman Gupta 2026-05-28 13:41:39 +08:00
  • 777af6af54 add temp hack to not use fit with gemma4, rm later Aman Gupta 2026-05-28 12:53:08 +08:00
  • 27461cd888 add Q rot when cache is quantized Aman Gupta 2026-05-22 00:17:02 +08:00
  • 7b87cd3598 add assert that draft + shared kv should be on same device Aman Gupta 2026-05-20 23:41:33 +08:00
  • 9af0434d8c fix multi-seq Aman Gupta 2026-05-19 22:17:09 +08:00
  • f268966d49 llama: Gemma 4 MTP Aman Gupta 2026-05-19 20:18:00 +08:00
  • 0066404085 server : add header to tools/server/server-http.h (#24089) b9505 A B 2026-06-04 05:14:46 -05:00
  • 7ac5a4225e cmake: skip cvector-generator and export-lora when CPU backend is disabled (#24053) b9504 Andrea Richiardi 2026-06-04 04:13:19 -06:00
  • e3ba22d6cc fix(mtmd): handle Gemma 4 audio projector embedding size (#24091) b9503 Andrei 2026-06-04 02:51:23 -07:00
  • 6ddc9430b1 readme : add status badges (#24104) Georgi Gerganov 2026-06-04 10:58:13 +03:00
  • 65ef50a0a4 tests : refactor test-save-load-state to accept token input (#24073) b9501 Georgi Gerganov 2026-06-04 08:06:36 +03:00
  • 3d1998634e metal : reduce rset heartbeat from 500ms -> 5ms (#24074) b9500 Georgi Gerganov 2026-06-04 08:05:32 +03:00
  • e8c54893f2 ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) b9499 Reese Levine 2026-06-03 22:05:04 -07:00
  • 3c7450cee1 ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754) b9498 rehan-10xengineer 2026-06-04 10:03:40 +05:00
  • f478f1b6d7 sycl : Improve SYCL doc (#23025) Todd Malsbary 2026-06-03 22:02:54 -07:00
  • 94a220cd67 mtmd: fix Gemma 4 unified FPE (#24088) b9496 Andrei 2026-06-03 12:51:18 -07:00
  • 166fe29492 qwen35: use post-norm hidden state for MTP (#24025) b9495 Aman Gupta 2026-06-04 01:29:09 +08:00
  • c8d6a00636 mtmd: enable non-causal vision for gemma 4 unified (#24082) b9494 Xuan-Son Nguyen 2026-06-03 19:05:17 +02:00
  • a731805ced mtmd, model: allow skip build_vit() (#24077) b9493 Xuan-Son Nguyen 2026-06-03 17:10:35 +02:00
  • ee4cf705bb ui: Mermaid Diagrams in chat + interactive preview (#24032) Aleksander Grygier 2026-06-03 16:55:36 +02:00
  • 9e58d4d692 Avoid PDL race conditions by disabling __restrict__ when PDL is used (#24030) b9491 Andreas Kieslinger 2026-06-03 13:56:42 +02:00
  • 3571fa5435 ggml-cpu: use runtime SVE width in FWHT (#24059) b9490 Charles Xu 2026-06-03 12:45:10 +02:00
  • f8f0a47a55 cuda: reserve space for quantize kv-cache at startup (#23907) b9489 Aman Gupta 2026-06-03 18:39:59 +08:00
  • 06938ac129 tests : add support for qwen3 SSM archs (#24031) b9488 Georgi Gerganov 2026-06-03 10:15:27 +03:00
  • d545a2a993 update BoringSSL to 0.20260526.0 (#23794) b9487 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-06-03 02:42:58 -03:00
  • 4da6370d43 ci : disable ccache for msvc windows release jobs (#23911) b9486 Georgi Gerganov 2026-06-03 08:05:21 +03:00
  • e3666269f9 arg : removed unecesary mmproj download when users pass --no-mmproj (#23425) b9485 Ryan Mangeno 2026-06-02 22:04:46 -07:00
  • 63e66fdd23 opencl: use flat variants of q4_K and q6_K gemv for very large M (#24006) b9484 lhez 2026-06-02 14:16:17 -07:00
  • 5c394fdc8b hexagon: profiler output fix and script updates (#24042) b9483 Max Krasnyansky 2026-06-02 14:08:29 -07:00
  • 4fb16eccce model: add Mellum architecture (#23966) b9482 Mikhail Podvitskii 2026-06-02 21:11:12 +02:00
  • bfb4308b05 model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716) b9481 Hans Florian 2026-06-02 11:55:11 -04:00
  • 2187e00337 StepFun 3.5 MTP (#23274) b9480 Piotr Wilkin (ilintar) 2026-06-02 17:44:35 +02:00
  • 0b7154066e common : fix state save in common_prompt_batch_decode (#23468) b9479 Daniel Bevenius 2026-06-02 15:44:15 +02:00
  • 60130d18f9 server: add SSE ping interval (#24013) b9478 Xuan-Son Nguyen 2026-06-02 14:14:55 +02:00
  • a468b89018 ci : reduce self-hosted server workflow jobs (#24012) Georgi Gerganov 2026-06-02 13:17:59 +03:00
  • d5ab0834ab docs : update HOWTO-add-model.md (#23883) Mikhail Podvitskii 2026-06-02 11:40:22 +02:00
  • 69cea5b669 ui: simplify network error handling (#23431) Marcos Del Sol Vives 2026-06-02 10:45:25 +02:00
  • f8e67fc583 ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI (#23434) b9474 Aleksander Grygier 2026-06-02 10:23:19 +02:00
  • 2365315955 kv-cache : SWA checkpoints store only non-masked cells (#23981) b9473 Georgi Gerganov 2026-06-02 11:06:29 +03:00
  • f7a0777a5c convert : support Step3.7-Flash (#23845) forforever73 2026-06-02 15:54:49 +08:00
  • 4f3a4beb8d llama : deprecate llama_set_warmup (#24009) b9471 Georgi Gerganov 2026-06-02 10:30:38 +03:00
  • 8f7f3bf141 hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models (#23989) b9470 Max Krasnyansky 2026-06-01 23:40:08 -07:00
  • d178a11818 hexagon: add gelu_quick (#24007) b9469 Todor Boinovski 2026-06-01 23:19:07 -07:00
  • 354ebac8cb server: real-time reasoning interruption via control endpoint (#23971) b9468 Pascal 2026-06-02 07:26:20 +02:00
  • 1fd5f48037 clean up unused variables warnings (#23975) b9467 Anav Prasad 2026-06-01 19:38:37 -07:00
  • 210a6570ce opencl: fix compiler warnings for non-adreno path (#23922) b9466 lhez 2026-06-01 19:15:09 -07:00
  • b8275a8acc revert to using global_invocation_id for cpy shader (#23955) Masashi Yoshimura 2026-06-02 08:59:06 +09:00
  • 5dcb711666 speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988) b9464 Georgi Gerganov 2026-06-01 22:26:58 +03:00
  • 5aa3a64596 nix : add nix-nodejs facilities to build Web UI (#23846) Christian Hoener zu Siederdissen 2026-06-01 20:01:26 +02:00
  • 27d9ed8397 opencl: add basic support for q5_0 and q5_1 (#23548) shaofeiqi 2026-06-01 10:06:50 -07:00