Commit Graph

  • 6efcd65945 vulkan: optimize flash attention split_k_reduce (#14554) b5849 Jeff Bolz 2025-07-08 13:11:42 -05:00
  • 699f4392a3 model : fix hunyuan moe chat template (#14584) b5848 stevenkuang 2025-07-09 00:29:29 +08:00
  • 08382869a2 model : add SmolLM3 (#14581) b5847 Xuan-Son Nguyen 2025-07-08 18:07:01 +02:00
  • bb4f7a9e4e memory : fix broken batch splits for recurrent cache (#14575) b5846 compilade 2025-07-08 11:37:47 -04:00
  • b8eeb8741d vulkan : fix rope with partial rotation and non-cont src (#14582) b5845 Jeff Bolz 2025-07-08 08:21:21 -05:00
  • 17a1f0d2d4 server: Add ability to mount server at prefix (#14544) b5844 Alawode Oluwandabira 2025-07-08 11:47:33 +03:00
  • 8f22dc0a53 model : add hunyuan moe (#14425) b5843 Xuan-Son Nguyen 2025-07-08 10:24:06 +02:00
  • 53903ae6fa vulkan: increase timeout for CI (#14574) Jeff Bolz 2025-07-08 02:38:31 -05:00
  • 4d0dcd4a06 cuda : fix rope with partial rotation and non-cont src (#14580) b5841 Georgi Gerganov 2025-07-08 10:15:21 +03:00
  • 75c91de6e9 CUDA: add bilinear interpolation for upscale (#14563) b5840 Aman Gupta 2025-07-08 10:11:18 +08:00
  • 6b38c7a04c memory : fix broken batch splits for recurrent cache Francis Couture-Harpin 2025-07-07 21:39:54 -04:00
  • 2ff3354c33 memory : fix broken batch splits for recurrent cache compilade/fix-recurrent-batch-init Francis Couture-Harpin 2025-07-07 21:19:12 -04:00
  • 985cda6c7b test-model-random : add Mamba2 Francis Couture-Harpin 2025-07-07 21:07:46 -04:00
  • 68155c66f0 musa: fix build warnings (unused variable) (#14561) b5839 R0CKSTAR 2025-07-08 07:58:30 +08:00
  • 48a5eba586 Merge branch 'master' into compilade/test-model-random Francis Couture-Harpin 2025-07-07 19:53:49 -04:00
  • 996195299e up. vb/add-smollm3 Vaibhavs10 2025-07-07 23:42:40 +02:00
  • e1a7059053 llama : fix incorrect minicpm3 v_states shape (#14571) b5838 Sigbjørn Skjæret 2025-07-07 23:35:35 +02:00
  • 12f55c302b llama : remove ggml_cont where possible (#14568) b5837 Sigbjørn Skjæret 2025-07-07 21:35:08 +02:00
  • f71635824b Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-07 14:57:56 -04:00
  • bf8b39015f metal : reuse graphs gg/metal-reuse-graphs Georgi Gerganov 2025-07-07 21:19:58 +03:00
  • b9c3eefde1 CUDA: add bf16 and i32 to getrows (#14529) b5836 Aman Gupta 2025-07-07 21:45:43 +08:00
  • 0d2038f90a llama-bench : add graph reuse parameter Georgi Gerganov 2025-07-07 09:07:15 +03:00
  • 6491d6e4f1 vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485) b5835 Eve 2025-07-06 10:29:36 +00:00
  • e592be1575 vulkan: fix rms_norm+mul fusion (#14545) b5834 Jeff Bolz 2025-07-06 03:08:16 -05:00
  • 76681e3c73 llama : reuse compute graphs Georgi Gerganov 2025-07-01 15:59:43 +03:00
  • a0374a67e2 vulkan: Handle updated FA dim2/3 definition (#14518) b5833 Jeff Bolz 2025-07-05 02:26:04 -05:00
  • ddef99522d server : fix assistant prefilling when content is an array (#14360) b5832 Sigbjørn Skjæret 2025-07-05 09:17:14 +02:00
  • 6681688146 opencl: add GELU_ERF (#14476) b5831 Sigbjørn Skjæret 2025-07-05 08:24:56 +02:00
  • bac8bed248 eval-callback : check for empty input (#14539) b5830 Georgi Gerganov 2025-07-05 07:18:09 +03:00
  • b81510a7b7 test-backend-ops: add support for specifying output format (#14368) b5829 R0CKSTAR 2025-07-05 12:10:53 +08:00
  • ef797db357 metal : disable fast math in all quantize kernels (#14528) b5828 Georgi Gerganov 2025-07-04 19:19:09 +03:00
  • 97c64a0974 up. Vaibhavs10 2025-07-04 14:15:34 +02:00
  • 886da0a2c5 kv-cache : prepare K/V buffers for separation gg/kv-cache-prepare-separation Georgi Gerganov 2025-07-03 15:05:27 +03:00
  • 67d1ef23c6 batch : add optional for sequential equal split (#14511) b5827 Georgi Gerganov 2025-07-04 09:08:59 +03:00
  • 7b50f7c025 graph : prepare for 4D mask (#14515) b5826 Georgi Gerganov 2025-07-04 09:05:36 +03:00
  • c79184d2d1 batch : add n_used count (#14512) b5825 Georgi Gerganov 2025-07-04 09:04:59 +03:00
  • 499a8f5a78 CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002) b5824 luyhcsu 2025-07-04 11:50:07 +08:00
  • 07c252f038 model : add Jamba to Mamba-specific hparams printing Francis Couture-Harpin 2025-07-03 17:10:18 -04:00
  • 20f8e43e63 graph : add back hybrid memory graph input Francis Couture-Harpin 2025-07-03 17:07:46 -04:00
  • 28657a8229 ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445) b5823 Sigbjørn Skjæret 2025-07-03 23:07:22 +02:00
  • 4682e21c46 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-03 16:03:56 -04:00
  • bee28421be opencl : broadcast for soft_max (#14510) b5822 lhez 2025-07-03 11:22:24 -07:00
  • 2b72bedec1 vulkan: support mixed/deepseekR1 FA head sizes (#14509) b5821 Jeff Bolz 2025-07-03 13:21:14 -05:00
  • c8c4495b8d ggml: backward pass for split swiglu (#14483) b5820 Johannes Gäßler 2025-07-03 17:05:18 +02:00
  • 7b63a71a6b Fix conditional enabling following arch checks for ggml-sycl (#14504) b5819 Nicolò Scipione 2025-07-03 11:00:03 +02:00
  • 0c2ee38ab7 convert : correct gemma 3n conversion (#14450) Xuan-Son Nguyen 2025-07-03 10:03:06 +02:00
  • a70c8a0c4b kv-cache : use ggml_set_rows (#14285) b5817 Georgi Gerganov 2025-07-03 10:53:35 +03:00
  • 9067487c44 ggml : fix FA mask dim 2 and 3 (#14505) b5816 Georgi Gerganov 2025-07-03 10:46:57 +03:00
  • d4cdd9c1c3 ggml : remove kompute backend (#14501) b5815 Georgi Gerganov 2025-07-03 07:48:32 +03:00
  • 908e6559d6 convert : fix jamba conv1d shape squeezing Francis Couture-Harpin 2025-07-02 23:49:12 -04:00
  • 2bcaf64e8e Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-02 21:41:39 -04:00
  • 55c2646b45 CUDA: add dynamic shared mem to softmax, refactor general usage (#14497) b5814 Aman Gupta 2025-07-03 07:45:11 +08:00
  • e75ba4c043 gguf-py : add support for chat template jinja files (#14508) Sigbjørn Skjæret 2025-07-02 21:02:35 +02:00
  • dfceb012ee llama : add "virtual sequences" gg/llama-high-throughput-rebase Georgi Gerganov 2025-06-23 16:29:02 +03:00
  • 5d46babdc2 llama : initial Mamba-2 support (#9126) b5812 compilade 2025-07-02 13:10:24 -04:00
  • e17991c466 sync : ggml b5811 Georgi Gerganov 2025-07-02 19:35:47 +03:00
  • c46944aa25 ggml : add version function to get lib version (ggml/1286) Daniel Bevenius 2025-07-02 13:55:32 +02:00
  • f3ed38d793 Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory. (#14309) b5809 Rotem Dan 2025-07-02 19:37:16 +03:00
  • 30b4d4e1b3 ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends Georgi Gerganov 2025-07-02 13:45:00 +03:00
  • 5495ea96ba kv-cache : add comments Georgi Gerganov 2025-07-02 13:39:48 +03:00
  • f3da97e61b kv-cache : bounds-check when accessing slot_info indices Georgi Gerganov 2025-07-02 13:39:10 +03:00
  • a70293bc25 kv-cache : improve find_slot impl Georgi Gerganov 2025-07-02 13:37:12 +03:00
  • 2ac5be3a58 cont : remove redundant ifs Georgi Gerganov 2025-06-30 14:44:09 +03:00
  • ac8f3474c8 graph : separate k and v indices Georgi Gerganov 2025-06-27 17:27:52 +03:00
  • cd811b7a9d kv-cache : use ggml_set_rows Georgi Gerganov 2025-06-19 19:26:47 +03:00
  • 55a1c5a5fd CUDA: add softmax broadcast (#14475) b5808 Aman Gupta 2025-07-02 20:34:24 +08:00
  • 12a81af45f CUDA: broadcasting for FlashAttention mask (#14500) Johannes Gäßler 2025-07-02 13:42:12 +02:00
  • 8875523eb3 vulkan: support softmax/FA batch and broadcast (#14449) Jeff Bolz 2025-07-01 03:32:56 -05:00
  • ec68e84c32 ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435) Georgi Gerganov 2025-06-27 21:50:57 +03:00
  • 307e79d33d opencl : fix possible buffer overflow in dump_tensor (#14490) b5804 zhouwg 2025-07-02 20:38:10 +08:00
  • d7f5f4e578 simple-chat : fix context-exceeded condition (#14494) b5803 Georgi Gerganov 2025-07-02 14:12:07 +03:00
  • c8a4e470f6 opencl : skip empty nodes on cgraph compute (#14491) b5802 Eric Zhang 2025-07-02 19:00:04 +08:00
  • 71bef66591 cuda : graceful fallback for Mamba-1 models with weird embd size compilade/mamba2 Francis Couture-Harpin 2025-07-02 02:56:42 -04:00
  • 603e43dc91 opencl : update upscale to support align corners (#14488) b5801 lhez 2025-07-02 00:07:42 -07:00
  • 611ba4b264 ci : add OpenCL to labeler workflow (#14496) Sigbjørn Skjæret 2025-07-02 09:02:51 +02:00
  • 73de1fd170 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-07-02 02:39:04 -04:00
  • 85841e121d github : add OpenCL backend to issue templates (#14492) Eric Zhang 2025-07-02 13:41:35 +08:00
  • 68b3cd6514 ggml : Callback before abort (#14481) b5798 Björn Ganster 2025-07-02 07:19:31 +02:00
  • de56944147 ci : disable fast-math for Metal GHA CI (#14478) b5797 Georgi Gerganov 2025-07-01 18:04:08 +03:00
  • 1b2aaf28ac Add Vulkan images to docker.md (#14472) Grzegorz Grasza 2025-07-01 15:44:11 +02:00
  • 343b6e94b6 CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (#14411) b5795 Chenguang Li 2025-07-01 16:47:30 +08:00
  • 6a746cf9c4 vulkan: Split large mul_mat_id to fit in shared memory (#14451) b5794 Jeff Bolz 2025-07-01 03:43:08 -05:00
  • eff5e45443 add GELU_ERF (#14455) b5793 Sigbjørn Skjæret 2025-07-01 10:14:21 +02:00
  • a6a47958a1 ggml : remove trailing whitespace (#0) b5792 Georgi Gerganov 2025-07-01 11:05:48 +03:00
  • f61c05d4b1 sync : ggml Georgi Gerganov 2025-07-01 10:27:52 +03:00
  • 431b2c24f3 ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285) Acly 2025-07-01 09:11:00 +02:00
  • 497be7c01d ggml-quants : rename best_mad to best_error (ggml/1283) Daniel Bevenius 2025-06-24 06:10:16 +02:00
  • 79b33b2317 opencl : add GEGLU, REGLU, SWIGLU (#14456) b5788 lhez 2025-07-01 00:19:16 -07:00
  • 0a5a3b5cdf Add Conv2d for CPU (#14388) b5787 Aman Gupta 2025-06-30 23:57:04 +08:00
  • 745f11fed0 memory : correctly handle failure in apply() (#14438) Georgi Gerganov 2025-06-30 18:03:03 +03:00
  • 5dd942de59 metal : disable fast-math for some cpy kernels (#14460) b5785 Georgi Gerganov 2025-06-30 17:04:05 +03:00
  • a7417f5594 ggml-cpu: sycl: Re-enable exp f16 (#14462) b5784 Romain Biessy 2025-06-30 14:52:02 +02:00
  • eb3fa2913e test-backend-ops : disable llama test (#14461) b5783 Diego Devesa 2025-06-30 03:43:15 -07:00
  • c839a2da1a cmake : Remove redundant include path in CMakeLists.txt (#14452) b5782 xiaobing318 2025-06-30 17:48:24 +08:00
  • e9b6350e61 scripts : make the shell scripts cross-platform (#14341) Vedran Miletić 2025-06-30 10:17:18 +02:00
  • caf5681fcb server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196) b5780 matteo 2025-06-29 20:02:53 +02:00
  • 83790b0e7e server : fix appearance of the chats list context menu for Safari (#14322) Renat 2025-06-29 19:29:57 +02:00
  • f47c1d7106 SYCL: disable faulty fp16 exp kernel (#14395) b5778 Akarshan Biswas 2025-06-29 21:07:58 +05:30
  • a5d1fb6212 ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443) b5777 Sigbjørn Skjæret 2025-06-29 14:38:10 +02:00
  • a0535ffa0d ggml : implement REGLU/GEGLU/SWIGLU ops (#14158) Sigbjørn Skjæret 2025-06-29 11:04:10 +02:00