Commit Graph

  • 5a54af4d4f sycl: Use syclcompat::dp4a (#10267) b4082 Romain Biessy 2024-11-15 04:09:12 +01:00
  • 1607a5e5b0 backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) b4081 Charles Xu 2024-11-15 01:28:50 +01:00
  • ae8de6d50a ggml : build backends as libraries (#10256) b4080 Diego Devesa 2024-11-14 18:04:35 +01:00
  • 4a8ccb37ad CUDA: no -sm row for very small matrices (#10185) b4079 Johannes Gäßler 2024-11-14 13:00:15 +01:00
  • 2a82891a85 speculative : fix out-of-bounds access (#10289) b4078 Georgi Gerganov 2024-11-14 11:44:15 +02:00
  • 5e6dad9322 speculative : experimenting with Qwen2.5 gg/speculative-experiments Georgi Gerganov 2024-11-14 11:31:31 +02:00
  • 33bdee667e speculative : fix out-of-bounds access gg/speculative-fix-oob Georgi Gerganov 2024-11-14 11:23:45 +02:00
  • af148c9386 vulkan: Optimize binary ops (#10270) b4077 Jeff Bolz 2024-11-13 23:22:55 -06:00
  • 66798e42fb vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) b4076 Jeff Bolz 2024-11-13 14:59:47 -06:00
  • fb4a0ec083 llama : propagate the results of graph_compute (#9525) b4075 Michael Podvitskiy 2024-11-13 20:00:35 +02:00
  • 5ea926dad7 sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
  • 1ee9eea094 docs : update bindings list (#10261) b4073 Small Grass Forest 2024-11-13 19:17:10 +08:00
  • ff7fb670d0 server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
  • 0e712a5acb server : fix incorrect res in validate_model_chat_template (#10272) b4071 Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
  • a0ec17b32e metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
  • 2e82ffa4af sycl : Fixes to broken builds and test-backend-ops (#10257) b4069 Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
  • 80dd7ff22f vulkan: Optimize contiguous copies (#10254) b4068 Jeff Bolz 2024-11-13 00:58:57 -06:00
  • 8c1b186cb5 metal : minor Q4_0 optimization gg/metal-q4_0-opt Georgi Gerganov 2024-11-12 15:30:51 +02:00
  • 86ed72d20c ggml : add ggml-metal-impl.h Georgi Gerganov 2024-11-10 18:29:09 +02:00
  • 63bab93c48 metal : add TODOs for rest of ops Georgi Gerganov 2024-11-10 17:56:12 +02:00
  • 964206a780 metal : GGML_OP_NORM Georgi Gerganov 2024-11-10 17:17:18 +02:00
  • e9ecd5d4de metal : GGML_OP_RMS_NORM Georgi Gerganov 2024-11-10 15:31:43 +02:00
  • 647a7044f5 metal : GGML_OP_CPY Georgi Gerganov 2024-11-10 13:55:26 +02:00
  • f46f710ca6 metal : GGML_OP_REPEAT Georgi Gerganov 2024-11-10 13:21:59 +02:00
  • 3250c98bf6 metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV Georgi Gerganov 2024-11-10 13:16:54 +02:00
  • 9058c51d9d metal : GGML_OP_CONCAT Georgi Gerganov 2024-11-10 13:03:25 +02:00
  • bb821e4854 cont : int safety + register optimizations Georgi Gerganov 2024-11-10 11:05:10 +02:00
  • c5cf1d74f0 cont : mul mm id Georgi Gerganov 2024-11-10 10:32:15 +02:00
  • 15a7105967 cont : thread counters style Georgi Gerganov 2024-11-10 09:57:41 +02:00
  • cacc4c225f cont : shmem style Georgi Gerganov 2024-11-10 09:45:06 +02:00
  • a1a201c1a9 cont : use char ptr Georgi Gerganov 2024-11-10 09:26:53 +02:00
  • c81640a5fc cont : args is first argument Georgi Gerganov 2024-11-10 08:47:30 +02:00
  • b65e4c1e10 cont : pass by reference Georgi Gerganov 2024-11-10 08:10:22 +02:00
  • c59a13d93f cont : mul mat vec Georgi Gerganov 2024-11-09 22:56:39 +02:00
  • 7670809ad4 metal : mul mat struct (wip) Georgi Gerganov 2024-11-09 17:54:40 +02:00
  • 4fd6fc5ab8 metal : cont + avoid potential int overflow [no ci] Georgi Gerganov 2024-11-09 16:39:36 +02:00
  • 9e07bcc06e metal : fattn args Georgi Gerganov 2024-11-09 16:09:31 +02:00
  • 1198ae7749 metal : add kernel arg structs (wip) Georgi Gerganov 2024-11-09 15:28:55 +02:00
  • 54ef9cfc72 vulkan: Throttle the number of shader compiles during the build step. (#10222) b4067 Jeff Bolz 2024-11-11 11:13:51 -06:00
  • b0cefea58a metal : more precise Q*K in FA vec kernel (#10247) b4066 Georgi Gerganov 2024-11-11 08:39:13 +02:00
  • b141e5f6ef server : enable KV cache defrag by default (#10233) b4065 Georgi Gerganov 2024-11-11 08:38:43 +02:00
  • 4b3a9212b6 flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
  • 505f33274d server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
  • 160687b3ed vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) b4062 Jeff Bolz 2024-11-10 05:37:56 -06:00
  • 6423c65aa8 metal : reorder write loop in mul mat kernel + style (#10231) b4061 Georgi Gerganov 2024-11-09 11:53:13 +02:00
  • 39a334a9aa metal : fix build and some more comments (#10229) b4060 Georgi Gerganov 2024-11-09 11:53:02 +02:00
  • bb38cdd8ba metal : fix F32 accumulation in FA vec kernel (#10232) b4059 Georgi Gerganov 2024-11-09 11:52:45 +02:00
  • f018acba22 llama : fix Qwen model type strings b4058 Georgi Gerganov 2024-11-09 11:26:34 +02:00
  • 46323fa9ef metal : hide debug messages from normal log b4057 Georgi Gerganov 2024-11-09 11:21:49 +02:00
  • 3d1fe1bb4d metal : int -> short, style gg/metal-mul-mat-write-opt Georgi Gerganov 2024-11-08 15:38:25 +02:00
  • 535050572a metal : reorder write loop Georgi Gerganov 2024-11-08 15:15:25 +02:00
  • bd1198a67a metal : fix build and some more comments gg/metal-fix-build Georgi Gerganov 2024-11-09 10:09:50 +02:00
  • 5b359bb1e3 ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) b4056 SXX 2024-11-09 15:35:46 +08:00
  • e89213492d ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) b4055 amritahs-ibm 2024-11-09 12:47:50 +05:30
  • 8fc393f246 scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
  • ec450d3bbf metal : opt-in compile flag for BF16 (#10218) b4053 Georgi Gerganov 2024-11-08 21:59:46 +02:00
  • 695ad752b2 metal : improve clarity (minor) (#10171) b4052 Georgi Gerganov 2024-11-08 18:37:41 +02:00
  • 841f27abdb metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
  • a2385da59c make : clean-up [no ci] gg/metal-fa-f16 Georgi Gerganov 2024-11-08 13:46:20 +02:00
  • d05b3127bd swift : exclude ggml-metal-embed.metal (#10211) b4050 Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
  • b89e71b195 metal : fix BF16 requirement for FA kernels Georgi Gerganov 2024-11-08 11:28:04 +02:00
  • bc143ecf81 cuda : disable BF16 FA Georgi Gerganov 2024-11-08 10:27:43 +02:00
  • 5d1a10d275 metal : prevent int overflows [no ci] Georgi Gerganov 2024-11-07 22:11:24 +02:00
  • 486a5eb8c1 build : remove obsolete compile flag [no ci] Georgi Gerganov 2024-11-07 21:51:28 +02:00
  • 120d51285c metal : compile-guard bf16 FA kernels Georgi Gerganov 2024-11-07 21:38:37 +02:00
  • 2fccc8ac2d metal : minor clean-up Georgi Gerganov 2024-11-07 21:29:22 +02:00
  • 7facc29d69 metal : use F16 precision in FA kernels Georgi Gerganov 2024-11-06 15:33:30 +02:00
  • 25e877309a ggml : add ggml_flash_attn_ext_get_prec Georgi Gerganov 2024-11-06 15:09:47 +02:00
  • 76c6e7f105 server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
  • a71d81cf8c server : revamp chat UI with vuejs and daisyui (#10175) b4048 Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
  • eec4d71737 scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
  • 3b08828674 sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
  • a2c6fd747c scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
  • 94accca4c2 vec move mask to shmem gg/metal-fa-f16-save Georgi Gerganov 2024-11-07 20:58:10 +02:00
  • 3b9625032c f16 vec Georgi Gerganov 2024-11-07 20:34:16 +02:00
  • 8f0ef15265 clean-up Georgi Gerganov 2024-11-07 20:02:31 +02:00
  • 022e5e90e9 remove compile flag Georgi Gerganov 2024-11-07 19:18:31 +02:00
  • 97404c4a03 ggml : add ggml-cpu.h to the public headers (#10204) b4044 Diego Devesa 2024-11-07 18:16:08 +01:00
  • 60e17ce23c Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00
  • a6c8dbfa5d wip Georgi Gerganov 2024-11-07 18:20:25 +02:00
  • 5107e8cea3 DRY: Fixes clone functionality (#10192) b4042 wwoodsTM 2024-11-07 08:20:25 -07:00
  • 4abeb60a1a int64 dst Georgi Gerganov 2024-11-07 17:17:29 +02:00
  • 3ab47eb746 float -> half regs Georgi Gerganov 2024-11-07 17:06:34 +02:00
  • e121d82f6a 64-bit -> 32-bit Georgi Gerganov 2024-11-07 17:00:06 +02:00
  • a75cdcca60 remove inner if mask Georgi Gerganov 2024-11-07 16:40:29 +02:00
  • 61d05b57d9 remove ms array Georgi Gerganov 2024-11-07 13:35:33 +02:00
  • 984928109c move mask to shared mem Georgi Gerganov 2024-11-07 13:12:10 +02:00
  • 2aedbb354e wip 5 Georgi Gerganov 2024-11-07 12:32:59 +02:00
  • dc2a27f2a2 wip 4 Georgi Gerganov 2024-11-07 09:26:10 +02:00
  • 2319126a70 fix q4_0_8_8 format for corrupted tokens issue (#10198) b4041 snadampal 2024-11-07 02:02:08 -06:00
  • 3bcd40b3c5 Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) b4040 Zhiyuan Li 2024-11-07 18:19:10 +11:00
  • 9bd5ae09ae wip 3 Georgi Gerganov 2024-11-06 22:52:33 +02:00
  • 2335086fd3 wip2 Georgi Gerganov 2024-11-06 22:04:07 +02:00
  • 01c7f11224 wip Georgi Gerganov 2024-11-06 21:06:56 +02:00
  • 0f7e8f389d metal : add GGML_METAL_FORCE_FATTN_PREC_F16 Georgi Gerganov 2024-11-06 16:21:37 +02:00
  • eefc132bb7 metal : use F16 precision in FA kernel Georgi Gerganov 2024-11-06 15:33:30 +02:00
  • 22a9311a1a ggml : add ggml_flash_attn_ext_get_prec Georgi Gerganov 2024-11-06 15:09:47 +02:00
  • 5c333e0140 metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
  • b11f9ba9b8 server : remove hack for extra parallel slot (#10187) b4038 Georgi Gerganov 2024-11-06 13:29:01 +02:00
  • 94d8cb8be1 metal : fix from ptr buffer name (#10189) b4037 Diego Devesa 2024-11-06 12:10:07 +01:00