Commit Graph

  • ba4dd0bc67 ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780) b9365 Georgi Gerganov 2026-05-27 17:22:20 +03:00
  • 617255d437 vendor : update cpp-httplib to 0.46.0 (#23650) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-27 10:36:24 -03:00
  • 87b0a60cdd pyproject : add conversion folder and update dependencies (#23746) Sigbjørn Skjæret 2026-05-27 15:06:18 +02:00
  • fda8528aa8 CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (#23742) Oliver Simons 2026-05-27 14:21:04 +02:00
  • 2d0656fbdd ci : bump cuda release to 13.3 (#23749) Sigbjørn Skjæret 2026-05-27 14:06:08 +02:00
  • 6b4e4bd582 common : fix env names to all have LLAMA_ARG_ prefix (#23778) b9360 Georgi Gerganov 2026-05-27 14:52:47 +03:00
  • 9f0e4b14d2 ci : fix windows ccaches (#23777) Georgi Gerganov 2026-05-27 13:54:21 +03:00
  • b3a739c9b6 ci : remove wasm test (#23733) Sigbjørn Skjæret 2026-05-27 12:11:37 +02:00
  • 4d8cc0c56f vulkan: avoid preferring transfer queue on AMD UMA devices (#22455) b9357 Winston Ma 2026-05-27 17:48:40 +08:00
  • 0d227ec358 ci : add ccache to server builds + fix undefined sanitizer build (#23763) Georgi Gerganov 2026-05-27 11:45:12 +03:00
  • 1d971bba36 docs : fix duplicated "the" in granitevision and model-conversion docs (#23767) quyentonndbs 2026-05-27 15:34:06 +08:00
  • 9777256c31 convert: add MiniCPM5 tokenizer support (#23384) b9354 zhangtao2-1 2026-05-27 13:08:33 +08:00
  • 7085492c6f server : fix the log message when using SSL (#23393) b9353 Radoslav Gerganov 2026-05-27 08:06:30 +03:00
  • b4c0549a49 ggml-zendnn : fixed naming of matmul function (#20964) b9352 Vladislav 2026-05-27 01:59:35 +03:00
  • 0d18aaa9d1 ci : do not allocate ccache for 3rd-party hosted runners (#23730) b9351 Georgi Gerganov 2026-05-26 20:15:01 +03:00
  • 08bc21b459 ci : move [no release] check to dedicated check_release job (#23734) Georgi Gerganov 2026-05-26 19:49:41 +03:00
  • 35a74c8fb9 ci : add [no release] keyword + fix sanitizer builds (#23728) Georgi Gerganov 2026-05-26 19:05:48 +03:00
  • 5190c2ea8d ci : move macos jobs to the apple workflow + fix names (#23721) Georgi Gerganov 2026-05-26 16:57:55 +03:00
  • 7799d31e68 vulkan: optimize conv2d and implement coopmat1 support (#22620) Jeff Bolz 2026-05-26 08:48:05 -05:00
  • 3a3ed153d9 ci : remove vulkan SDK dep from webgpu job (#23718) Georgi Gerganov 2026-05-26 16:40:30 +03:00
  • ef66bfab68 hexagon: add support for CONCAT op (#23648) Max Krasnyansky 2026-05-26 06:20:05 -07:00
  • 678d43d720 ci : move more CPU jobs to self-hosted runners (#23715) Georgi Gerganov 2026-05-26 15:37:40 +03:00
  • ef41a69179 ci : move sanitizer jobs to self-hosted runners (#23713) Georgi Gerganov 2026-05-26 15:22:09 +03:00
  • 3dc7684f39 ci : reduce (disable SYCL and CANN builds/releases) (#23705) Georgi Gerganov 2026-05-26 15:21:21 +03:00
  • dbe9c0c8ce convert : support Gemma4ForCausalLM architecture (#23682) b9341 ghleg 2026-05-26 07:00:31 +02:00
  • 6fe90deffa models : Attach Mistral3 NVFP4 weight scales (#23629) Michael Wand 2026-05-26 00:59:59 -04:00
  • 581d020b12 SYCL: implement ggml_sycl_pool_vmm (#22862) Alexey Kopytko 2026-05-26 13:59:00 +09:00
  • 7623de11d9 tests: test-backend-ops -j <N> to run tests in parallel (#23637) Jeff Bolz 2026-05-25 23:57:56 -05:00
  • c9d98295a3 model : add support for talkie-1930-13b (#22596) Niklas Sheth 2026-05-26 00:57:38 -04:00
  • 1506d39e76 ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (#23594) Masashi Yoshimura 2026-05-26 12:42:49 +09:00
  • 54121f7325 [WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (#23457) Nikhil Jain 2026-05-25 20:32:49 -07:00
  • 192d8ae8b8 CUDA: missing PDL sync for FWHT, better fallback (#23690) b9334 Johannes Gäßler 2026-05-26 05:05:51 +02:00
  • 35c9b1f39e metal : add apple device id (#23566) b9333 forforever73 2026-05-26 02:05:16 +08:00
  • 4bead4e30d snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23680) Max Krasnyansky 2026-05-25 10:57:43 -07:00
  • 302e2c2652 ci : reduce PR jobs by matching backend paths (#23675) b9331 Georgi Gerganov 2026-05-25 20:54:54 +03:00
  • 328874d054 model: tag ffn_latent as MUL_MAT to fix buft probe (#23664) b9330 Pascal 2026-05-25 16:05:04 +02:00
  • c1f1e28d29 CUDA: add fast walsh-hadamard transform (#23615) b9329 Aman Gupta 2026-05-25 21:12:10 +08:00
  • 5a4126adc1 ui: fix stop/continue during an agentic loop (#23356) Pascal 2026-05-25 14:18:59 +02:00
  • a4d2d4ae41 convert : add compressed-tensors NVFP4 support (#21095) Michael Wand 2026-05-25 08:16:11 -04:00
  • d161ea7071 sync : ggml b9326 Georgi Gerganov 2026-05-25 12:42:28 +03:00
  • 45158f460e ggml : bump version to 0.13.0 (ggml/1510) Georgi Gerganov 2026-05-25 12:40:17 +03:00
  • 22307b3e8b sync : ggml Georgi Gerganov 2026-05-25 12:33:22 +03:00
  • ce5890b5f7 ggml : bump version to 0.12.1 (ggml/1508) Georgi Gerganov 2026-05-25 12:13:21 +03:00
  • b251f74f49 ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500) Ori Pekelman 2026-05-21 12:00:16 +00:00
  • fa97041524 ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (ggml/1492) Dev-X25874 2026-05-21 17:28:08 +05:30
  • ae251b5ff2 TP: fix ggml context size calculation (#22616) b9320 Johannes Gäßler 2026-05-25 11:37:25 +02:00
  • 66efd13375 ggml: gguf_init_from_callback and gguf_init_from_buffer (#22341) b9319 Gilad S. 2026-05-25 11:33:29 +02:00
  • 6c4cbdc70b server: MTP layer kv-cache should respect draft type ctk (#23646) b9318 Aman Gupta 2026-05-25 16:46:23 +08:00
  • 5fdf07e33b ci : update spacemit toolchain url and enhance curl command (#23642) alex-spacemit 2026-05-25 16:43:24 +08:00
  • 062d3115aa ci : fix pre-tokenizer-hashes check (#23651) Sigbjørn Skjæret 2026-05-25 10:41:25 +02:00
  • 314e729347 llama : document that only one on-device state can be saved per sequence (#23520) b9315 Tim Neumann 2026-05-25 09:29:28 +02:00
  • d55fb97174 ci : install host compiler on android-ndk build (#23630) Aldehir Rojas 2026-05-25 03:18:08 -04:00
  • 826539ce59 ggml : Parallelize quant LUT init (#23595) b9313 Jeff Bolz 2026-05-25 02:15:46 -05:00
  • f3ba33ec35 address feedback 0cc4m/cuda-get-memory-contextless Ruben Ortlam 2026-05-25 08:52:58 +02:00
  • b96487645c ui: media attachments before text (#23467) Saba Fallah 2026-05-25 08:50:41 +02:00
  • 9627d0f540 vendor : update cpp-httplib to 0.45.1 (#23639) b9311 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-25 03:45:22 -03:00
  • baac31998a cont : another try gg/ui-fix-ci-errors Georgi Gerganov 2026-05-25 09:14:48 +03:00
  • e2ef8fe42c server: fix checkpoints creation (#22929) b9310 jacekpoplawski 2026-05-25 07:56:18 +02:00
  • 3021f0f4c5 ui : try to fix e2e demo test Georgi Gerganov 2026-05-25 08:32:44 +03:00
  • b02a677519 ui : run prettier Georgi Gerganov 2026-05-25 08:28:25 +03:00
  • 6d57c26ef8 perplexity : fix even more integer overflows (#23623) b9309 fairydreaming 2026-05-25 07:12:39 +02:00
  • 28123a3937 ci : move most slim jobs to self-hosted runners (#23619) Georgi Gerganov 2026-05-25 08:11:19 +03:00
  • 87f18f760e ci : add self-hosted ui workflow gg/ci-ui-test Georgi Gerganov 2026-05-24 22:18:31 +03:00
  • 16b648c897 ci : try ui SH gg/ci-ui-sh Georgi Gerganov 2026-05-24 21:09:13 +03:00
  • cf285e195e ci : move python requirements check to CPU runners Georgi Gerganov 2026-05-24 20:16:00 +03:00
  • 07ec9fd8d9 ci : add comment about UI jobs Georgi Gerganov 2026-05-24 20:10:36 +03:00
  • 36aa88a853 cont : move e2e to SH gg/ci-ui-self-hosted Georgi Gerganov 2026-05-24 20:00:15 +03:00
  • a85051e51c ci : try to move UI to self hosted runner Georgi Gerganov 2026-05-24 19:56:32 +03:00
  • 5a2e768430 ci : back to 3.11 Georgi Gerganov 2026-05-24 19:39:33 +03:00
  • 5a727def3d ci : move lint back to 3.11 Georgi Gerganov 2026-05-24 19:35:39 +03:00
  • f0bbb1a9ea ci : try to bump 3.11 -> 3.13 Georgi Gerganov 2026-05-24 19:24:35 +03:00
  • a0a98e702c ci : prevent cmake pkg to run on dedicated fast runners Georgi Gerganov 2026-05-24 18:44:10 +03:00
  • 8c75e6ee7e ci : prevent heavy CPU jobs from running on fast runners Georgi Gerganov 2026-05-24 18:37:20 +03:00
  • 651afdb47d ci : slim -> self-hosted Georgi Gerganov 2026-05-24 18:27:58 +03:00
  • 5f0e5348ba ci : remove tag from build-self-hosted.yml Georgi Gerganov 2026-05-24 18:08:11 +03:00
  • 549b9d8433 ci : update build-self-hosted.yml (#23616) Georgi Gerganov 2026-05-24 18:20:10 +03:00
  • ced88c03cb ci : remove tag from build-self-hosted.yml gg/ci-remove-tag Georgi Gerganov 2026-05-24 18:08:11 +03:00
  • bb69b8f87b ci : update build-self-hosted.yml Georgi Gerganov 2026-05-24 12:18:17 +03:00
  • 3c4d2b759f cuda: read memory through NVML if available to avoid initializing a context Ruben Ortlam 2026-05-24 11:51:36 +02:00
  • 5d246a792d convert : minor fixes for numpy 2.x (#23571) Sigbjørn Skjæret 2026-05-24 09:51:31 +02:00
  • 63248fc3e3 cmake : fix ui build (#23592) b9305 Aldehir Rojas 2026-05-24 03:37:28 -04:00
  • 83eebe9d08 server: add margin for draft model for fit (#23485) Aman Gupta 2026-05-24 14:43:08 +08:00
  • fff63b5108 TP: fix entirely zero-sized slices per device (#23525) Johannes Gäßler 2026-05-24 08:19:33 +02:00
  • f3061116ff opencl: batch profiling to improve speed and prevent memory leaks (#23495) shaofeiqi 2026-05-23 23:11:43 -07:00
  • 1c0f6db545 hexagon: apply repl optimization in flash attn softmax as #22993 (#23455) b9301 Yiwei Shao 2026-05-23 19:56:59 -07:00
  • cec51c7a7d snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552) Aparna M P 2026-05-24 08:26:41 +05:30
  • b22ff4b7b4 cmake/ui : refactor the build (#23352) Aldehir Rojas 2026-05-23 17:08:22 -04:00
  • c0c7e147e7 requirements : bump torch to 2.11.0 (#23503) Aditya Singh 2026-05-23 09:24:39 -07:00
  • b0df4c0cfd model : add NVFP4 MTP scale tensors (#23563) b9297 Michael Wand 2026-05-23 07:30:31 -04:00
  • a497476330 ggml : Check the right iface method before using the fallback 2d get (#23514) b9296 dskwe 2026-05-23 18:49:24 +08:00
  • 95405ac65f vulkan: fix windows find_package of SPIRV-Headers (#23215) b9295 Jeff Bolz 2026-05-23 02:44:46 -05:00
  • 0f3cb3fc8b opencl: generalize Adreno MoE kernels on M (#23449) b9294 Shawn Gu 2026-05-22 17:08:41 -07:00
  • 1acee6bf89 server: only parse empty msg if continuing an assistant msg (#23506) Aldehir Rojas 2026-05-22 11:58:15 -04:00
  • ef570f6308 perplexity : fix integer overflow (#23496) b9292 fairydreaming 2026-05-22 14:50:44 +02:00
  • cc9e331213 SYCL: improve MoE prefill throughput (#23142) b9291 Alexey Kopytko 2026-05-22 21:50:17 +09:00
  • bcfd1989e9 sycl : Level Zero detection in ggml_sycl_init (#23097) b9290 Alexey Kopytko 2026-05-22 21:49:45 +09:00
  • 56f16f235c SYCL : gated_delta_net K>1 (#23174) b9289 karavayev 2026-05-22 08:48:56 -04:00
  • 8cc67efcd4 SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (#21580) Katostrofik 2026-05-22 08:48:24 -04:00
  • 95feeab52e docs: Update documentation with Granite 4.0/4.1 (#23404) Jesus Talavera 2026-05-22 14:35:46 +02:00
  • 99d4026b11 ggml-zendnn : add Q8_0 quantization support (#23414) b9286 Sachin Sharma 2026-05-22 16:46:55 +05:30