Commit Graph

  • 3e4bb29666 vulkan: Check maxStorageBufferRange in supports_op (#18709) b7735 Jeff Bolz 2026-01-14 03:59:05 -06:00
  • 47f9612492 llama-model: fix unfortunate typo (#18832) Aman Gupta 2026-01-14 17:55:15 +08:00
  • 01cbdfd7eb CUDA : fix typo in clang pragma comment [no ci] (#18830) Daniel Bevenius 2026-01-14 10:31:49 +01:00
  • 635ef78ec5 vulkan: work around Intel fp16 bug in mmq (#18814) Ruben Ortlam 2026-01-14 09:41:23 +01:00
  • 7d587e5544 ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705) b7731 Perry Naseck 2026-01-14 02:22:25 -05:00
  • d34aa07193 mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819) b7730 Daniel Benjaminsson 2026-01-14 08:11:05 +01:00
  • f709c7a33f ci, tests : use cmake to download models and remove libcurl dependency (#18791) b7729 Adrien Gallouët 2026-01-14 07:46:27 +01:00
  • 6e36299b47 llama : print_info alignment fix (#18708) b7728 ddh0 2026-01-13 17:05:11 -06:00
  • 60591f01d4 model : add EXAONE MoE (#18543) b7727 Junwon Hwang 2026-01-14 07:28:38 +09:00
  • e4832e3ae4 vocab : fix attribute overrides for harmony (#18806) b7726 Georgi Gerganov 2026-01-13 17:40:13 +02:00
  • 960e5e3b46 llama-mmap: fix direct-io loading fallback EOF exception (#18801) b7725 Ruben Ortlam 2026-01-13 15:57:07 +01:00
  • 20ca2e12c4 model-conversion : remove -c 0 from model card template [no ci] (#18807) Daniel Bevenius 2026-01-13 14:13:10 +01:00
  • ea4a321f2a HIP: add fattn-mma-f16 for RDNA4 (#18481) b7723 yulo 2026-01-13 20:52:16 +08:00
  • c1e79e610f doc: ban AI-generated PR descriptions [no ci] (#18765) Johannes Gäßler 2026-01-13 13:43:12 +01:00
  • e047f9ee9d mtmd: fix use_non_causal being reported incorrectly (#18793) b7721 Xuan-Son Nguyen 2026-01-13 12:19:38 +01:00
  • 0a57271ab6 CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800) b7720 Georgi Gerganov 2026-01-13 12:25:53 +02:00
  • 076b0faf7d graph : clean up t5 input builders (#18795) b7719 Gabe Goodhart 2026-01-13 01:43:51 -07:00
  • db79dc06b1 llama-bench: add direct_io parameter (#18778) b7718 Ruben Ortlam 2026-01-13 08:49:10 +01:00
  • 537d4240d4 ci : remove libcurl in releases (#18775) b7717 Adrien Gallouët 2026-01-12 21:43:02 +01:00
  • bcf7546160 server : add arg for disabling prompt caching (#18776) b7716 Radoslav Gerganov 2026-01-12 19:21:34 +02:00
  • 36c5913c45 ci : use openssl for openEuler-latest-cmake-cann (#18779) Adrien Gallouët 2026-01-12 17:29:00 +01:00
  • 8e649571cd vendor : update cpp-httplib to 0.30.1 (#18771) b7714 Adrien Gallouët 2026-01-12 15:58:52 +01:00
  • 4150da9a95 examples : add --kv-unified to batched example (#18774) b7713 Daniel Bevenius 2026-01-12 13:47:58 +01:00
  • 8e2da778da vulkan: change memory_logger to be controlled by an env var (#18769) b7712 Jeff Bolz 2026-01-12 06:32:55 -06:00
  • ce3bf9b1a4 server: update docs for sleeping [no ci] (#18777) Xuan-Son Nguyen 2026-01-12 13:01:24 +01:00
  • 08b5d956fc minor : std::unordered_set over std::set pr/18490 Georgi Gerganov 2026-01-12 13:35:25 +02:00
  • 2bbe4c2cf8 vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678) b7710 Jeff Bolz 2026-01-12 05:32:13 -06:00
  • 1051ecd289 vulkan: Disable large coopmat matmul configuration on proprietary AMD driver (#18763) b7709 Ruben Ortlam 2026-01-12 07:29:35 +01:00
  • 0c3b7a9efe model: fix qwen3next broken due to #18683 (#18762) b7708 Xuan-Son Nguyen 2026-01-11 21:00:10 +01:00
  • 0e76501e1d Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support (#18749) b7707 Ruben Ortlam 2026-01-11 17:33:33 +01:00
  • 4b060bf240 security: make it clear about subtopics in server (#18754) Xuan-Son Nguyen 2026-01-11 16:51:03 +01:00
  • 9789e28459 debug : include LLAMA_POOLING_TYPE_UNSPECIFIED in pooling check (#18692) b7705 Daniel Bevenius 2026-01-11 16:34:41 +01:00
  • 84ae04f163 tests : refactor test-backend-sampler (#18753) b7704 Georgi Gerganov 2026-01-11 17:31:03 +02:00
  • 506bb6e010 model: try to improve Qwen3 Next (#18683) b7703 Xuan-Son Nguyen 2026-01-11 12:53:33 +01:00
  • 79456a690a readme : update UIs (#18751) thom-dev-fr 2026-01-11 12:46:50 +01:00
  • 28068af789 security: narrow down the scope of what we consider a vulnerability (#18752) Xuan-Son Nguyen 2026-01-11 12:23:36 +01:00
  • 707cbafcaa opencl: add SOFTPLUS op support (#18726) b7700 shaofeiqi 2026-01-10 21:57:44 -08:00
  • 75883cde73 eagle3: add support for gpt-oss-120B eagle3 ruixiangw 2026-01-10 18:33:41 +00:00
  • 13a9f31de3 eagle3: make d2t mapping optional ruixiangw 2026-01-10 18:30:19 +00:00
  • b137718878 test-backend-ops: fix mxfp4 tests on blackwell (#18736) b7699 Aman Gupta 2026-01-11 01:12:57 +08:00
  • d2ff4e23ac HIP: adjust RDNA3.5 MMQ kernel selction logic (#18666) b7698 Johannes Gäßler 2026-01-10 17:19:01 +01:00
  • 657a2e644b cmake : update blas logic (#18205) b7697 Perry Naseck 2026-01-10 11:00:54 -05:00
  • f307926482 server : adjust unified KV cache tests (#18716) Georgi Gerganov 2026-01-10 17:51:56 +02:00
  • 7fdc8c893d scripts : follow api redirects in pr2wt.sh (#18739) Sigbjørn Skjæret 2026-01-10 16:04:05 +01:00
  • 23f82f2420 preset: allow named remote preset (#18728) b7694 Xuan-Son Nguyen 2026-01-10 15:12:29 +01:00
  • 3da288d78d eagle3: load lm_head from target model if not in draft model when convert GGUF ruixiangw 2026-01-10 14:09:50 +00:00
  • 2656c0d265 docs(ggml): update backend ops (#18734) Aaron Teo 2026-01-10 18:48:17 +08:00
  • 600a366478 Corrected: changed s13 = src1->nb[3] instead of nb[2] (#18724) b7692 Michael Wand 2026-01-10 01:16:07 -08:00
  • ea23c15990 common : add --license to display embedded licenses (#18696) b7691 Adrien Gallouët 2026-01-10 09:46:24 +01:00
  • 9ac2693a30 server: fix n_cmpl not skipping processing prompt (#18663) b7690 Xuan-Son Nguyen 2026-01-10 00:00:41 +01:00
  • a61c8bc3bf mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256) b7689 Simranjeet Singh 2026-01-09 22:42:38 +00:00
  • 593da7fa49 opencl: add EXPM1 op (#18704) b7688 shaofeiqi 2026-01-09 10:13:13 -08:00
  • 9e41884dce Updates to webgpu get_memory (#18707) b7687 Reese Levine 2026-01-09 08:17:18 -08:00
  • 4a2751258a server : simplify prompt state transition branches gg/server-refactor Georgi Gerganov 2026-01-09 17:46:03 +02:00
  • ec8fd7876b Webui/file upload (#18694) Pascal 2026-01-09 16:45:32 +01:00
  • a180ba78c7 cmake: only build cli when server is enabled (#18670) b7685 Asbjørn Olling 2026-01-09 16:43:26 +01:00
  • cc5cafecf4 fix : nullptr task dereference Georgi Gerganov 2026-01-09 17:32:39 +02:00
  • aef22e7afc cont : reduce parent checks Georgi Gerganov 2026-01-09 16:44:07 +02:00
  • 9ceb268ee1 cont : remove redundant function Georgi Gerganov 2026-01-09 16:42:29 +02:00
  • a4854f0349 cont : improve n_cmpl logic Georgi Gerganov 2026-01-09 15:30:39 +02:00
  • caff0fd247 server : adjust unified KV cache tests gg/server-test-fix-race Georgi Gerganov 2026-01-09 14:26:14 +02:00
  • 71ba283a65 add eagle3 support for Qwen3 MoE models ruixiangw 2026-01-09 11:54:28 +00:00
  • f2d988db55 cont : cleanup Georgi Gerganov 2026-01-09 13:22:04 +02:00
  • 91fd50be1b Merge branch 'master' into pr/18663 Georgi Gerganov 2026-01-09 13:05:16 +02:00
  • 53eb9435da server : fix timing of prompt/generation (#18713) b7684 Georgi Gerganov 2026-01-09 12:59:50 +02:00
  • d3435efc8a scripts : pr2wt.sh reset to remote head (#18695) Georgi Gerganov 2026-01-09 12:16:40 +02:00
  • 439c3b5021 cont : init child samplers + modify child logic Georgi Gerganov 2026-01-09 10:52:10 +02:00
  • 59dda88aae Merge branch 'master' into HEAD Georgi Gerganov 2026-01-09 09:35:12 +02:00
  • f5f8812f7c server : use different seeds for child completions (#18700) b7682 Georgi Gerganov 2026-01-09 09:33:50 +02:00
  • c0d99e65d2 add eagle3 support for Qwen3 series models ruixiangw 2026-01-08 23:49:06 +00:00
  • 8ece3836b4 common: support remote preset (#18520) b7681 Xuan-Son Nguyen 2026-01-08 22:35:40 +01:00
  • 046d5fd44e llama: use host memory if device reports 0 memory (#18587) b7680 Aaron Teo 2026-01-09 05:34:56 +08:00
  • 480160d472 ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (#18628) b7679 Masashi Yoshimura 2026-01-09 01:36:42 +09:00
  • 15bff84bf5 ggml webgpu: initial flashattention implementation (#18610) b7678 Reese Levine 2026-01-08 08:23:39 -08:00
  • 0fca4308f7 Initial plan copilot/sub-pr-18695 copilot-swe-agent[bot] 2026-01-08 15:16:59 +00:00
  • 2524c26164 vulkan: fix push constant size for quantize_q8_1 (#18687) b7677 Jeff Bolz 2026-01-08 08:40:58 -06:00
  • cb14b06995 vulkan: optimize ssm_scan (#18630) b7676 Jeff Bolz 2026-01-08 08:16:54 -06:00
  • 5eb799a6c0 scripts : pr2wt.sh reset to remote head Georgi Gerganov 2026-01-08 16:04:19 +02:00
  • 55abc39355 vendor : update cpp-httplib to 0.30.0 (#18660) b7675 Adrien Gallouët 2026-01-08 13:53:54 +01:00
  • f2f6c88067 scripts : support chaining commands in pr2wt.sh (#18671) Georgi Gerganov 2026-01-08 13:40:23 +02:00
  • 945bf10627 metal : add MoE kernel specialization for ne20=5 (#18667) b7673 도로로도로또 2026-01-08 19:37:45 +09:00
  • 64848deb18 llama-fit-params: free memory target per device (#18679) b7672 Johannes Gäßler 2026-01-08 10:07:58 +01:00
  • 9a5724dee2 ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535) Doctor Shotgun 2026-01-08 01:03:21 -08:00
  • 9c142e3a2a model-conversion : add warn about transformers mismatch (#18691) Daniel Bevenius 2026-01-08 09:29:53 +01:00
  • df7fb92170 model-conversion : remove -st targets for converted model (#18689) Daniel Bevenius 2026-01-08 09:29:15 +01:00
  • 2038101bd9 llama : add use_direct_io flag for model loading (#18166) b7668 Julius Tischbein 2026-01-08 07:35:30 +01:00
  • 568371a726 opencl: add FILL op support (#18682) b7667 shaofeiqi 2026-01-07 22:04:50 -08:00
  • 5b8844ae53 scripts : fix repos cloned with .git extension (#18669) b7666 Sigbjørn Skjæret 2026-01-07 22:35:34 +01:00
  • 7e16fef085 convert : more variants of rope_theta config entries (#18668) Sigbjørn Skjæret 2026-01-07 22:34:51 +01:00
  • f5245b5e4e cuda : fix build on cuda 12.8 (#18672) b7664 Oliver Walsh 2026-01-07 21:32:44 +00:00
  • ae9f8df778 fix(docker): add missing libglvnd libraries to Vulkan image (#18664) R 2026-01-07 16:57:42 +01:00
  • 56d2fed2b3 tools : remove llama-run (#18661) b7662 Adrien Gallouët 2026-01-07 16:18:26 +01:00
  • 56426673cb scripts : add pr2wt.sh (#18644) Georgi Gerganov 2026-01-07 15:16:20 +02:00
  • d7c27d4964 fix infinite loop on empty batch Xuan Son Nguyen 2026-01-07 14:08:05 +01:00
  • bb77764c2d convert : clarify sentence-transformers-dense-modules help [no ci] (#18662) Daniel Bevenius 2026-01-07 13:18:53 +01:00
  • a9d7bcb7fc server: fix n_cmpl not skipping processing Xuan Son Nguyen 2026-01-07 13:13:53 +01:00
  • 9dfa8ee950 ci : run cann build unconditionally [no ci] (#18659) Sigbjørn Skjæret 2026-01-07 13:07:08 +01:00
  • ca4a8370bc vulkan: reject ops when a tensor is too large to allocate (#18646) b7658 Jeff Bolz 2026-01-07 05:03:32 -06:00
  • 03023296cf vulkan: Warptile tuning for Intel Xe2/Xe3 (#18178) b7657 virajwad 2026-01-07 02:59:47 -08:00
  • 8c77a04cc7 vulkan: more mul mat optimizations (#18533) b7656 Eve 2026-01-07 10:13:17 +00:00