Commit Graph

  • 50d2b21d7c metal : add comments Georgi Gerganov 2025-09-28 18:08:13 +03:00
  • d9e0e7c819 ci : fix musa docker build (#16306) R0CKSTAR 2025-09-28 22:38:15 +08:00
  • 0629437601 cuda : add TODO about KV padding requirement Georgi Gerganov 2025-09-28 17:25:37 +03:00
  • 66692977e8 cont : simplify Georgi Gerganov 2025-09-28 17:21:51 +03:00
  • 5d0d2d2289 metal : pad K, V and Mask when needed Georgi Gerganov 2025-09-21 17:59:31 +03:00
  • 0124ac989f devops: switch to using ubuntu-22.04-s390x image (#16302) b6617 Aaron Teo 2025-09-28 19:25:58 +08:00
  • 2811c65286 Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297) Imad Saddik 2025-09-28 12:04:46 +01:00
  • d8359f5fde vulkan: 64-bit im2col (#16135) b6615 Jeff Bolz 2025-09-28 01:38:37 -05:00
  • 6a2c6145a0 metal : extend mat-mat multiplication support (#16225) Georgi Gerganov 2025-09-28 09:34:44 +03:00
  • 3b53634fe3 metal : fuse non-sequential nodes (#16102) b6613 Georgi Gerganov 2025-09-28 09:34:05 +03:00
  • 1384abf8b8 vulkan: handle mat_mul with A matrix > 4GB (#16176) b6612 Jeff Bolz 2025-09-27 20:36:34 -05:00
  • e6d65fb02d vulkan: support arbitrary KV dimension in flash attention (#16160) b6611 Jeff Bolz 2025-09-27 16:43:39 -04:00
  • 8656f5de68 vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16224) b6610 Acly 2025-09-27 22:41:03 +02:00
  • 4807e8f96a Show message actions by default (#16289) Aleksander Grygier 2025-09-27 19:56:40 +02:00
  • c0bfc57af4 CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277) b6608 Aman Gupta 2025-09-28 00:49:32 +08:00
  • 75a3a6c2cd CUDA: refactor and deduplicate vector FA kernels (#16208) b6607 Johannes Gäßler 2025-09-27 18:45:07 +02:00
  • 0499b29c6f vulkan: throw system error instead of SIGABRT during init on older devices (#16156) b6606 Dmytro Minochkin 2025-09-27 19:26:46 +03:00
  • 234e2ff8ed server : remove old LLAMA_SERVER_SSL (#16290) b6605 Adrien Gallouët 2025-09-27 18:17:08 +02:00
  • 3f81b4e91c vulkan: support GET_ROWS for k-quants (#16235) b6604 Jeff Bolz 2025-09-27 06:36:11 -04:00
  • ace6a54565 build : add LLAMA_OPENSSL option (#16287) b6603 Adrien Gallouët 2025-09-27 11:12:46 +02:00
  • 72b24d96c6 model : make minicpm embedding_scale, residual_scale and logit_scale optional with legacy defaults (#16273) b6602 Vinkal 2025-09-27 02:58:29 +05:30
  • 624207e676 devops: add s390x & ppc64le CI (#15925) b6601 Aaron Teo 2025-09-27 02:03:33 +08:00
  • 807e8c6d31 Enhance text file detection logic for file attachments (#16199) Aleksander Grygier 2025-09-26 19:25:29 +02:00
  • 1a18927894 Allow viewing conversations even when llama server is down (#16255) Aleksander Grygier 2025-09-26 18:35:42 +02:00
  • e0539eb6ae webui: switch to hash-based routing (alternative of #16079) (#16157) b6598 Isaac McFadyen 2025-09-26 11:36:48 -04:00
  • 5d0a40f390 Always show message actions for mobile UI + improvements for user message sizing (#16076) Aleksander Grygier 2025-09-26 15:59:07 +02:00
  • d12a983659 codeowners : add rgerganov as owner of RPC [no ci] (#16279) Radoslav Gerganov 2025-09-26 16:09:34 +03:00
  • cc1cfa277b mtmd : fix uninitialized variable in bicubic_resize (#16275) b6595 Aleksei Nikiforov 2025-09-26 15:00:44 +02:00
  • 54dbc37053 metal : report OOM errors (#16274) b6594 Georgi Gerganov 2025-09-26 14:14:28 +03:00
  • b995a10760 common : use cpp-httplib as a cURL alternative for downloads (#16185) b6593 Adrien Gallouët 2025-09-26 13:12:19 +02:00
  • 4710dd31bb build : fix build-ios-device (#16257) Adrien Gallouët 2025-09-26 12:39:35 +02:00
  • 9b26511857 ggml-cpu: implement MXFP4 SIMD for s390x (#16193) b6591 Aaron Teo 2025-09-26 18:27:25 +08:00
  • 00217cd413 ci : create git tags for released docker images (#16008) Radoslav Gerganov 2025-09-26 13:19:23 +03:00
  • 3b337b01a1 codeowners : add danbev as owner of build-xcframework.sh [no ci] (#16268) Daniel Bevenius 2025-09-26 07:53:36 +02:00
  • a86a580a66 musa: upgrade musa sdk to 4.3.0 (#16240) R0CKSTAR 2025-09-26 08:56:38 +08:00
  • 0f7c69689f musa: fix build warnings (#15611) b6587 R0CKSTAR 2025-09-26 08:56:10 +08:00
  • 835b2b915c model : add GroveMoE support (#15510) b6586 Sigbjørn Skjæret 2025-09-25 19:50:28 +02:00
  • b05a9d650f vendors: update miniaudio version (#16212) b6585 Aaron Teo 2025-09-25 23:38:10 +08:00
  • 27052978e4 readme : update bindings (#16144) rtaluyev 2025-09-25 18:20:34 +03:00
  • 077c94d0ca CUDA: add a fused top-K MoE kernel (#16130) b6583 Aman Gupta 2025-09-25 22:35:05 +08:00
  • aa3ee0eb0b model-conversion : add embedding prompt file support (#15871) b6582 Daniel Bevenius 2025-09-25 12:02:36 +02:00
  • d0991da39d server : add support for external server for tests (#16243) Daniel Bevenius 2025-09-25 11:36:47 +02:00
  • aa719c2f88 ggml : fix loongarch lsx compilation error (#15864) b6580 junchao-zhao 2025-09-25 17:22:55 +08:00
  • 4cdd0bb453 docs: fix typo [no ci] (#16244) Johannes Gäßler 2025-09-25 11:12:27 +02:00
  • b5bd037832 llama : add support for qwen3 reranker (#15824) b6578 Douglas Hanley 2025-09-25 03:53:09 -05:00
  • dfcd53f7ec metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220) Georgi Gerganov 2025-09-25 11:30:16 +03:00
  • 4ea00794b8 metal : relax reorder conditions (#16216) b6576 Georgi Gerganov 2025-09-25 11:29:42 +03:00
  • 02a6a82ae7 metal : restore im2col perf (#16219) b6575 Georgi Gerganov 2025-09-25 11:29:08 +03:00
  • c498fc82fe rpc : use ggml logging facilities b6574 Radoslav Gerganov 2025-09-25 10:20:02 +03:00
  • e7a5130a20 codeowners: add ownership of zdnn backend [no ci] (#16232) Aaron Teo 2025-09-25 13:06:30 +08:00
  • bee378e098 ci: run the x64 and arm ci on the github machines instead (#16183) b6572 Eve 2025-09-25 05:06:06 +00:00
  • 5fb557653b devops: fix s390x docker release failure (#16231) Aaron Teo 2025-09-25 11:36:30 +08:00
  • 4ae88d07d0 codeowners: add ownership of zdnn backend [no ci] (#16229) Aaron Teo 2025-09-25 00:25:04 +08:00
  • e789095502 llama: print memory breakdown on exit (#15860) b6569 Johannes Gäßler 2025-09-24 16:53:48 +02:00
  • f2a789e334 ggml : split graph allocations according to backend max buffer size (#15815) b6568 Acly 2025-09-24 16:17:49 +02:00
  • 3a59971967 model : add label for LiquidAI LFM2-2.6B model (#16204) b6567 Tarek Dakhran 2025-09-24 13:42:26 +02:00
  • 63b54c81a6 model-conversion : make causal-verify-logits fails with model names containing "." (#16215) Jie Fu (傅杰) 2025-09-24 16:25:26 +08:00
  • 152729f884 common : add missing chrono header for common.cpp (#16211) b6565 Uilian Ries 2025-09-24 08:53:47 +02:00
  • c0c59c1157 codeowners : match all requirements files (#16214) Sigbjørn Skjæret 2025-09-24 08:53:20 +02:00
  • 7735706b93 model-conversion : run-org-model.py fails to run on mac m1 (#16213) Jie Fu (傅杰) 2025-09-24 14:46:52 +08:00
  • 4d9ea03d17 codeowners : use slash prefix for root files [no ci] (#16210) Daniel Bevenius 2025-09-24 08:10:09 +02:00
  • 8ba548dae2 model-conversion : fix the make targets in the README.md (#16209) Jie Fu (傅杰) 2025-09-24 12:19:23 +08:00
  • f505bd83ca ci : disable AMD workflows + update NVIDIA workflows (#16200) Georgi Gerganov 2025-09-23 20:41:40 +03:00
  • 0889589dbe ci : enable Vulkan workflow on Mac (#16194) Georgi Gerganov 2025-09-23 13:44:25 +03:00
  • 4e29084ba4 ggml-cpu: Respect cpumask settings (#16164) b6558 Xiangyan Sun 2025-09-23 01:58:12 -07:00
  • f6b4af3d04 ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928) b6557 Sigbjørn Skjæret 2025-09-23 10:25:20 +02:00
  • 264f1b5187 zdnn: refactor codebase + add docs (#16178) b6556 Aaron Teo 2025-09-23 14:53:05 +08:00
  • 0bc7cc7154 codeowners : add @danbev to model-conversion example [no ci] (#16190) Daniel Bevenius 2025-09-23 08:13:22 +02:00
  • 4b9f4cb0f8 devops: add s390x containers (#15915) Aaron Teo 2025-09-23 13:59:34 +08:00
  • 85e72271ba ggml-cpu : fix typo in gemm comments [no ci] (#16189) Daniel Bevenius 2025-09-23 05:59:03 +02:00
  • 1d0125bcf1 feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (#16177) Gabe Goodhart 2025-09-22 12:40:10 -06:00
  • 351f3da39c clang-tidy : disable warning about performance enum size (#16127) Haiyue Wang 2025-09-23 01:57:46 +08:00
  • 3ecb2f671a ggml : implement set_rows with i32 index (#16159) b6550 Sigbjørn Skjæret 2025-09-22 19:13:00 +02:00
  • 432cf4304c codeowners : update + cleanup (#16174) b6549 Georgi Gerganov 2025-09-22 18:20:21 +03:00
  • 37a23c17bd common : enable --offline mode without curl support (#16137) b6548 Adrien Gallouët 2025-09-22 14:13:51 +02:00
  • 138c87ce8b webui : fix handling incomplete chunks (#16107) Quentin Bramas 2025-09-22 10:53:13 +02:00
  • c6db9a1027 embedding : fix typos in README (#16171) GideonSerf 2025-09-22 10:49:58 +02:00
  • d05affbab7 common : remove unused local variables (#16140) b6545 Haiyue Wang 2025-09-22 16:48:42 +08:00
  • 4f324a556c ggml : extend ggml_can_fuse to work with non-sequential nodes (#16123) b6544 Georgi Gerganov 2025-09-22 11:12:37 +03:00
  • a71ae3ba7a ggml : add ggml_op_is_empty (#16122) b6543 Georgi Gerganov 2025-09-22 11:12:09 +03:00
  • 05a2458121 codeowners : update ownership for @ngxson and @allozuar (#16128) Xuan-Son Nguyen 2025-09-22 15:10:58 +07:00
  • 96fdca043b Vulkan: add conv_transpose_2d operation (#16022) b6541 Shin-myoung-serp 2025-09-22 17:04:01 +09:00
  • b2d980fce0 codeowners : claim responsibility for ci, models, gguf-py and convert (#16124) Sigbjørn Skjæret 2025-09-22 09:59:05 +02:00
  • 5c6106a696 contrib : update roles (#16113) Georgi Gerganov 2025-09-22 10:58:02 +03:00
  • ec65fb52f0 ci : remove vulkaninfo calls (#16169) Georgi Gerganov 2025-09-22 10:16:05 +03:00
  • 1d660d2fae ci : use smaller model (#16168) Georgi Gerganov 2025-09-22 09:11:39 +03:00
  • a20d810d79 vulkan: add RTE variants of exp shader (#16165) b6536 Jeff Bolz 2025-09-22 00:37:17 -05:00
  • 4d0a7cbc61 ci : adjust params for less runtime (#16167) b6535 Georgi Gerganov 2025-09-22 08:31:40 +03:00
  • 9073a73d82 vulkan: vec dot matrix multiplication fix (#16151) b6534 Ruben Ortlam 2025-09-22 07:22:43 +02:00
  • 51f5a45fbe opencl: fix concat crash on win arm64 with Adreno (#15944) b6533 lhez 2025-09-21 16:42:10 -07:00
  • c4510dc937 opencl: initial q8_0 mv support (#15732) b6532 lhez 2025-09-21 14:48:44 -07:00
  • da30ab5f86 ci : add label for the RISC-V runner (#16150) Georgi Gerganov 2025-09-21 19:00:27 +03:00
  • 28baac9c9f ci : migrate ggml ci to self-hosted runners (#16116) Georgi Gerganov 2025-09-21 16:50:45 +03:00
  • 1eeb523c3e vulkan: optimize UMA buffer operations and fix driver hangs (#16059) b6529 Giuseppe Scrivano 2025-09-21 08:31:55 +02:00
  • 5bb4a3edec vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR (#16086) b6528 Jeff Bolz 2025-09-21 01:23:37 -05:00
  • 17ca6ed540 Implement llama-pull tool llama-pull Eric Curtin 2025-09-20 17:24:35 +01:00
  • 7f766929ca sync : ggml b6527 Georgi Gerganov 2025-09-20 12:55:47 +03:00
  • 405921dcef ggml : introduce semantic versioning (ggml/1336) Daniel Bevenius 2025-09-16 06:16:52 +02:00
  • fa6383ca7e CUDA : conditionally add cuda architectures (ggml/1341) Gregor Jasny 2025-09-10 17:21:11 +02:00
  • 803dac2e48 vulkan: use vec dot for matrix matrix multiplications (#16056) b6524 Ruben Ortlam 2025-09-20 10:42:56 +02:00