Commit Graph

  • b83cae088c speculative : add infill mode gg/speculative-infill Georgi Gerganov 2024-11-26 11:14:17 +02:00
  • 0eb4e12bee vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) b4174 Junil Kim 2024-11-26 10:47:20 +09:00
  • 0cc63754b8 Introduce llama-run (#10291) b4173 Eric Curtin 2024-11-25 16:56:24 -05:00
  • 50d5cecbda ci : build docker images only once daily (#10503) b4172 Diego Devesa 2024-11-25 22:05:39 +01:00
  • 9fd8c2687f server : add more information about error (#10455) b4171 Georgi Gerganov 2024-11-25 22:28:27 +02:00
  • 47f931c8f9 server : enable cache_prompt by default (#10501) b4170 Georgi Gerganov 2024-11-25 21:50:07 +02:00
  • 106964e3d2 metal : enable mat-vec kernels for bs <= 4 (#10491) b4169 Georgi Gerganov 2024-11-25 21:49:31 +02:00
  • 80acb7b430 Rename Olmo1124 to Olmo2 (#10500) b4168 Shane A 2024-11-25 10:36:09 -08:00
  • 10bce0450f llama : accept a list of devices to use to offload a model (#10497) b4167 Diego Devesa 2024-11-25 19:30:06 +01:00
  • 1f922254f0 Github: update issue templates [no ci] (#10489) Johannes Gäßler 2024-11-25 19:18:37 +01:00
  • 1ee6c482d0 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-11-25 12:04:23 -05:00
  • e3fe61203c llama : partially apply clang-format style Francis Couture-Harpin 2024-11-25 11:31:46 -05:00
  • 691698e152 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-11-25 10:40:20 -05:00
  • a9a678a6b2 Add download chat feature to server chat (#10481) brucepro 2024-11-25 08:11:55 -08:00
  • 9ca2e67762 server : add speculative decoding support (#10455) b4164 Georgi Gerganov 2024-11-25 16:31:38 +02:00
  • 5931c1f233 ggml : add support for dynamic loading of backends (#10469) b4163 Diego Devesa 2024-11-25 15:13:39 +01:00
  • f6d12e7df8 tests : fix compile warning b4162 Georgi Gerganov 2024-11-25 15:17:32 +02:00
  • 4ff0831ce6 metal : use F16 math in mul_mat kernels gg/metal-mul-mat-f16 Georgi Gerganov 2024-11-08 13:21:59 +02:00
  • b756441104 metal : minor code formatting b4161 Georgi Gerganov 2024-11-25 15:08:04 +02:00
  • 5a8987793f [SYCL] Fix building Win package for oneAPI 2025.0 update (#10483) b4160 Neo Zhang Jianyu 2024-11-25 17:31:10 +08:00
  • d9d54e498d speculative : refactor and add a simpler example (#10362) Georgi Gerganov 2024-11-25 09:58:41 +02:00
  • 8006f3b3c8 llama : remove implicit recurrent state rollbacks Francis Couture-Harpin 2024-11-24 20:35:30 -05:00
  • cce5a90075 flake.lock: Update (#10470) Georgi Gerganov 2024-11-24 18:03:25 +02:00
  • dc39012cba llama : fix op mul check with command-r-plus (#10476) b4157 Diego Devesa 2024-11-24 16:10:26 +01:00
  • 9336db462c convert : XLMRoberta Type Vocab Size (#10458) Gabe Goodhart 2024-11-24 02:02:34 -07:00
  • 96fa2c5e2d fix gguf-py: Conversion error when multiple licenses are configured (#9807) momonga 2024-11-24 09:09:22 +09:00
  • 55ed008b2d ggml : do not use ARM features not included in the build (#10457) b4154 Diego Devesa 2024-11-23 14:41:12 +01:00
  • 6dfcfef078 ci: Update oneAPI runtime dll packaging (#10428) b4153 蕭澧邦 2024-11-22 17:44:08 +08:00
  • 599b3e0cd4 GitHub: ask for more info in issue templates (#10426) Johannes Gäßler 2024-11-22 08:32:40 +01:00
  • c18610b4ee CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216) b4151 leo-pony 2024-11-22 14:07:20 +08:00
  • a5e47592b6 cuda : optimize argmax (#10441) b4150 Diego Devesa 2024-11-21 18:18:50 +01:00
  • 1bb30bf28c llama : handle KV shift for recurrent models (#10402) b4149 Georgi Gerganov 2024-11-21 10:22:47 +02:00
  • 87a533be57 sync : ggml b4148 Georgi Gerganov 2024-11-21 09:22:11 +02:00
  • 59b9172822 ggml/sched : do not skip views in pre-assignments slaren 2024-11-20 13:25:08 +01:00
  • 02e4eaf22f ggml-opt: fix data corruption (ggml/1022) Johannes Gäßler 2024-11-20 14:56:04 +01:00
  • 9abe9eeae9 vulkan: predicate max operation in soft_max shaders/soft_max (#10437) Jeff Bolz 2024-11-20 13:47:36 -06:00
  • f95caa7954 cmake: add link dependencies to cmake find pkg (#10433) bandoti 2024-11-20 12:22:19 -04:00
  • fab5d30ff6 llama : add .clang-format file (#10415) b4143 Diego Devesa 2024-11-20 12:57:53 +01:00
  • 8fd4b7fa29 vulkan: copy iq4_nl LUT into shared memory (#10409) b4142 Jeff Bolz 2024-11-20 01:40:18 -06:00
  • 1bacb9f625 vulkan: further optimize mul_mat_vec using larger loads (#10387) b4141 Jeff Bolz 2024-11-20 01:11:00 -06:00
  • ad21c9e1f1 update rel to 4040 (#10395) Neo Zhang Jianyu 2024-11-20 13:54:25 +08:00
  • 3952a221af Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) b4139 Anthony Van de Gejuchte 2024-11-19 23:18:17 +01:00
  • 42ae10bbcd add cmake rvv support (#10411) b4138 haopeng 2024-11-20 04:10:31 +08:00
  • 9fe0fb0626 sync : ggml b4137 Georgi Gerganov 2024-11-19 19:15:50 +02:00
  • 611fabd792 metal : fox offset integer overflows in im2col (ggml/1015) Plamen Minev 2024-11-18 15:02:27 +02:00
  • 12b0ad953a metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) PAB 2024-11-18 10:02:49 +01:00
  • 342397dc7e cmake: force MSVC compiler charset to utf-8 (#9989) b4134 蕭澧邦 2024-11-20 01:42:00 +08:00
  • 2a11b6b094 Add required ggml-base and backend libs to cmake pkg (#10407) b4133 bandoti 2024-11-19 12:10:30 -04:00
  • 3ee6382d48 cuda : fix CUDA_FLAGS not being applied (#10403) b4132 Diego Devesa 2024-11-19 14:29:38 +01:00
  • 8e752a777b llama : add check for KV cache shifts (#10401) b4131 Georgi Gerganov 2024-11-19 13:29:26 +02:00
  • a88ad007de llama : add OLMo November 2024 support (#10394) b4130 Shane A 2024-11-19 01:04:08 -08:00
  • 2a1507c162 sycl : Add option to set the SYCL architecture for all targets (#10266) b4129 Romain Biessy 2024-11-19 09:02:23 +01:00
  • b3e585988f vulkan: Optimize soft_max (#10301) b4128 Jeff Bolz 2024-11-19 01:25:17 -06:00
  • 557924f222 sycl: Revert MUL_MAT_OP support changes (#10385) b4127 Alberto Cabrera Pérez 2024-11-19 00:50:04 +00:00
  • d3481e6316 cuda : only use native when supported by cmake (#10389) b4126 Diego Devesa 2024-11-18 18:43:40 +01:00
  • 531cb1c233 Skip searching root path for cross-compile builds (#10383) bandoti 2024-11-18 11:23:58 -04:00
  • f139d2ea61 vulkan: remove use of null initializer (#10372) Jeff Bolz 2024-11-18 08:28:42 -06:00
  • 2eb76b2a5e flake.lock: Update (#10346) Georgi Gerganov 2024-11-18 16:08:20 +02:00
  • 9b75f03cd2 Vulkan: Fix device info output format specifiers (#10366) b4122 0cc4m 2024-11-18 11:02:43 +01:00
  • 75207b3a88 docker: use GGML_NATIVE=OFF (#10368) Johannes Gäßler 2024-11-18 00:21:53 +01:00
  • 76e9e58b78 CUDA: fix MMV kernel being used for FP16 src1 (#10357) b4120 Johannes Gäßler 2024-11-17 23:20:42 +01:00
  • ce2e59ba10 CMake: fix typo in comment [no ci] (#10360) Johannes Gäßler 2024-11-17 12:59:38 +01:00
  • be5caccef9 llama : only use default buffer types for the KV cache (#10358) b4118 Diego Devesa 2024-11-17 12:25:45 +01:00
  • 20a780c7b6 gitignore : ignore local run scripts [no ci] Georgi Gerganov 2024-11-17 13:12:22 +02:00
  • cf32a9b93a metal : refactor kernel args into structs (#10238) Georgi Gerganov 2024-11-17 11:23:01 +02:00
  • a43178299c ggml : fix undefined reference to 'getcpu' (#10354) b4115 FirstTimeEZ 2024-11-17 21:39:22 +13:00
  • c3ea58aca4 CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) b4114 Johannes Gäßler 2024-11-17 09:09:55 +01:00
  • 467576b6cc CMake: default to -arch=native for CUDA build (#10320) b4113 Johannes Gäßler 2024-11-17 09:06:34 +01:00
  • eda7e1d4f5 ggml : fix possible buffer use after free in sched reserve (#9930) b4112 Diego Devesa 2024-11-17 07:31:17 +01:00
  • 24203e9dd7 ggml : inttypes.h -> cinttypes (#0) b4111 Georgi Gerganov 2024-11-16 23:40:39 +02:00
  • 5d9e59979c ggml : adapt AMX to tensor->grad removal (#0) Georgi Gerganov 2024-11-16 21:38:01 +02:00
  • a4200cafad make : add ggml-opt (#0) Georgi Gerganov 2024-11-16 21:35:31 +02:00
  • 84274a10c3 tests : remove test-grad0 Georgi Gerganov 2024-11-16 21:34:03 +02:00
  • 68fcb4759c ggml : fix compile warnings (#0) Georgi Gerganov 2024-11-16 21:32:41 +02:00
  • 8a43e940ab ggml: new optimization interface (ggml/988) Johannes Gäßler 2024-11-16 22:17:59 +02:00
  • 5c9a8b22b1 scripts : update sync Georgi Gerganov 2024-11-16 22:16:04 +02:00
  • 0fff7fd798 docs : vulkan build instructions to use git bash mingw64 (#10303) FirstTimeEZ 2024-11-17 12:29:18 +13:00
  • 4e54be0ec6 llama/ex: remove --logdir argument (#10339) b4103 Johannes Gäßler 2024-11-16 23:00:41 +01:00
  • db4cfd5dbc llamafile : fix include path (#0) b4102 Georgi Gerganov 2024-11-16 17:58:56 +02:00
  • 8ee0d09ae6 make : auto-determine dependencies (#0) Georgi Gerganov 2024-11-16 17:58:32 +02:00
  • bcdb7a2386 server: (web UI) Add samplers sequence customization (#10255) b4100 MaggotHATE 2024-11-16 18:26:54 +05:00
  • f7b0233eca wip gg/logits-slowdown Georgi Gerganov 2024-11-16 10:04:49 +02:00
  • f245cc28d4 scripts : fix missing key in compare-llama-bench.py (#10332) Georgi Gerganov 2024-11-16 10:32:50 +02:00
  • 772703c8ff vulkan: Optimize some mat-vec mul quant shaders (#10296) b4098 Jeff Bolz 2024-11-16 00:26:57 -06:00
  • dd3a6ce9f8 vulkan : add cmake preset debug/release (#10306) FirstTimeEZ 2024-11-16 14:59:33 +13:00
  • 1e58ee1318 ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) b4096 Dan Johansson 2024-11-16 01:53:37 +01:00
  • 89e4caaaf0 llama : save number of parameters and the size in llama_model (#10286) b4095 FirstTimeEZ 2024-11-16 13:42:13 +13:00
  • 74d73dc85c Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) b4094 Srihari-mcw 2024-11-16 02:57:00 +05:30
  • 4047be74da scripts: update compare-llama-bench.py (#10319) b4093 Johannes Gäßler 2024-11-15 21:19:03 +01:00
  • 883d206fbd ggml : fix some build issues b4092 slaren 2024-11-15 20:20:54 +01:00
  • 09ecbcb596 cmake : fix ppc64 check (whisper/0) b4091 Georgi Gerganov 2024-11-15 15:35:22 +02:00
  • 3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
  • cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
  • 18429220bd AVX BF16 and single scale quant optimizations (#10212) b4088 Eve 2024-11-15 11:47:58 +00:00
  • f0204a0ec7 ci: build test musa with cmake (#10298) b4087 R0CKSTAR 2024-11-15 19:47:25 +08:00
  • 57f8355b29 sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
  • 9901068ac7 server : (web UI) add copy button for code block, fix api key (#10242) b4085 Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
  • 231f9360d9 cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
  • 4802ad350b scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
  • 5a54af4d4f sycl: Use syclcompat::dp4a (#10267) b4082 Romain Biessy 2024-11-15 04:09:12 +01:00