Commit Graph

  • d8c054517d Add preprocessor checks for Apple devices. Mathijs de Bruin 2024-02-06 14:39:22 +00:00
  • 42f664a382 Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Mathijs de Bruin 2024-02-03 18:00:11 +00:00
  • 5dde540897 Allow for Vulkan build with Accelerate. Mathijs de Bruin 2024-02-03 17:56:46 +00:00
  • 40c3a6c1e1 cuda : ignore peer access already enabled errors (#5597) b2205 slaren 2024-02-19 23:40:26 +01:00
  • f24ed14ee0 make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598) b2204 Jared Van Bortel 2024-02-19 15:54:12 -05:00
  • 9d679f0fcc examples : support minItems/maxItems in JSON grammar converter (#5039) b2203 nopperl 2024-02-19 14:14:07 +00:00
  • 1387cf60f7 llava : remove extra cont (#5587) b2202 Georgi Gerganov 2024-02-19 15:23:17 +02:00
  • 6fd413791a llava : replace ggml_cpy with ggml_cont b2201 slaren 2024-02-19 14:02:36 +01:00
  • 337c9cbd52 sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
  • a3145bdc30 ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
  • 890559ab28 metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
  • d0e3ce51f4 ci : enable -Werror for CUDA builds (#5579) b2197 Georgi Gerganov 2024-02-19 14:45:41 +02:00
  • 68a6b98b3c make : fix CUDA build (#5580) b2196 Georgi Gerganov 2024-02-19 13:41:51 +02:00
  • f249c997a8 llama : adapt to F16 KQ_pos gg/flash-attn-sync Georgi Gerganov 2024-02-19 13:10:24 +02:00
  • 31109ca00a Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-02-19 12:58:18 +02:00
  • 70d45af0ef readme : fix typo in README-sycl.md (#5353) valiray 2024-02-19 02:37:10 -08:00
  • 412735ec70 Merge branch 'master' into gg/metal-batched gg/metal-batched Georgi Gerganov 2024-02-19 11:25:24 +02:00
  • 13e2c771aa cmake : remove obsolete sycl compile flags (#5581) b2194 Abhilash Majumder 2024-02-19 14:45:18 +05:30
  • f53119cec4 minor : fix trailing whitespace (#5538) b2193 Georgi Gerganov 2024-02-19 10:34:10 +02:00
  • 7084755396 llava : avoid changing the original BakLLaVA model (#5577) Daniel Bevenius 2024-02-19 09:31:59 +01:00
  • 4480542b22 baby-llama : allocate graphs in ggml_context (#5573) b2191 NawafAlansari 2024-02-19 03:25:38 -05:00
  • 11b12de39b llama : add llama_chat_apply_template() (#5538) b2190 Xuan Son Nguyen 2024-02-19 09:23:37 +01:00
  • 3a9cb4ca64 cuda, metal : fix nans in soft_max (#5574) b2189 slaren 2024-02-19 09:04:45 +01:00
  • 769a716e30 readme : update (#5572) Mirko185 2024-02-19 08:39:31 +01:00
  • f0d1fafc02 ggml : android and old glibc NUMA incompatibility bugfixes (#5557) b2187 bmwl 2024-02-18 23:38:32 -08:00
  • a0c2dad9d4 build : pass all warning flags to nvcc via -Xcompiler (#5570) b2186 Jared Van Bortel 2024-02-18 16:21:52 -05:00
  • 14278f55d2 ggml : restore vec dot stride arg names (#5453) b2185 Georgi Gerganov 2024-02-18 22:58:57 +02:00
  • 47c662b0de fix some spaces added by IDE in math op gg/rename-n_ctx Pierrick HYMBERT 2024-02-18 21:04:04 +01:00
  • 606873401c rename n_ctx to kv_size Pierrick HYMBERT 2024-02-18 20:59:26 +01:00
  • ef96e8b1f7 server: document the --ctx-size deprecation in server README.md Pierrick HYMBERT 2024-02-18 11:20:34 +01:00
  • 9a0695671d server: rename legacy --ctx-size to --kv-size Pierrick HYMBERT 2024-02-17 11:49:42 +01:00
  • b1de96824b ci : fix wikitext url + compile warnings (#5569) b2184 Georgi Gerganov 2024-02-18 22:39:30 +02:00
  • 7ad554f90e metal : fix unused warnings (#0) Georgi Gerganov 2024-02-18 21:39:58 +02:00
  • 5ee99c32f5 common, server : surface min_keep as its own parameter (#5567) b2182 Robey Holderith 2024-02-18 11:11:16 -08:00
  • c145f8a132 server : slots monitoring endpoint (#5550) b2181 Pierrick Hymbert 2024-02-18 18:39:57 +01:00
  • 689a091bbe sampling : do not set min_keep to n_probs (#5564) b2180 Georgi Gerganov 2024-02-18 19:38:06 +02:00
  • f3f28c5395 cmake : fix GGML_USE_SYCL typo (#5555) b2179 Georgi Gerganov 2024-02-18 19:17:00 +02:00
  • e75c6279d1 server : enhanced health endpoint (#5548) b2178 Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • 36376abe05 server : --n-predict option document and cap to max value (#5549) b2177 Pierrick Hymbert 2024-02-18 17:30:09 +01:00
  • 66c1968f7a server : graceful server shutdown (#5244) b2176 Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 1dcc3fde00 common : fix ub (#5530) b2175 Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • 5d3de51f97 ggml, common, examples, tests : fixed type arguments in printf (#5528) b2174 Herman Semenov 2024-02-18 16:20:12 +00:00
  • fc0c8d286a llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • bd2d4e393b 1.5 bit quantization (#5453) b2172 Kawrakow 2024-02-18 18:16:55 +02:00
  • c8e0d7efeb flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • 8f1be0d42f ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 6e4e973b26 ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • d250c9d61d gitignore : update for CLion IDE (#5544) b2168 clibdev 2024-02-17 18:28:37 +02:00
  • 974e3cadff ggml : try another fix gg/fix-android Georgi Gerganov 2024-02-17 18:14:35 +02:00
  • e9caab61a2 ggml : no cpu_set_t on Android Georgi Gerganov 2024-02-17 17:50:39 +02:00
  • 5bf2b94dd4 cmake : fix VULKAN and ROCm builds (#5525) b2167 Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • d2819d5577 scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • 4cb0727698 llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • 65085c713e llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 6dcc02d244 server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • 5f5808ca7b server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • f486f6e1e5 ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • 60ed04cf82 llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • 594845aab1 ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 4524290e87 Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • c06e45d729 clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • 9060a1e9df cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 9350a1cf21 scripts : add hf.sh helper script (#5501) Georgi Gerganov 2024-02-15 15:41:15 +02:00
  • 73122473ff fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) Michaël de Vries 2024-02-15 14:14:37 +01:00
  • e856bfed3b hf : add support for --repo and --file gg/hf Georgi Gerganov 2024-02-15 15:05:15 +02:00
  • e834aa1fd4 hf : add error logs Georgi Gerganov 2024-02-15 14:59:12 +02:00
  • 0d4177126b llava : fix memory management bug (#5491) Elbios 2024-02-15 09:01:57 +01:00
  • 7930a8a6e8 llaba : hotfix for llava-1.6 image number (#5495) John 2024-02-15 08:59:18 +01:00
  • 303da63442 scripts : add hf.sh helper scripts Georgi Gerganov 2024-02-15 09:54:20 +02:00
  • 704359e299 vulkan: Find optimal memory type but with fallback (#5381) Neuman Vong 2024-02-15 17:11:15 +11:00
  • 594fca3fef readme : fix typo (#5490) Rune 2024-02-14 16:15:49 +01:00
  • ccbb277f46 llava : update README.md (#5489) John 2024-02-14 15:49:42 +01:00
  • 8084d55440 cmake : ARM intrinsics detection for MSVC (#5401) Michael Podvitskiy 2024-02-14 11:49:01 +03:00
  • aa23412989 llava : support v1.6 (#5267) John 2024-02-14 08:38:35 +01:00
  • f5ca054855 Early return for zero size calls to get_tensor. (#5482) AT 2024-02-13 15:44:25 -06:00
  • 6c00a06692 gguf : add python reader example (#5216) John 2024-02-13 18:56:38 +01:00
  • ea9c8e1143 llama : add support for Nomic Embed (#5468) b2144 Jared Van Bortel 2024-02-13 12:03:53 -05:00
  • ccd757a174 convert : fix mistakes from refactoring ceb/nomic-bert Jared Van Bortel 2024-02-13 11:59:11 -05:00
  • c2f407e398 cleanup convert-hf-to-gguf.py Jared Van Bortel 2024-02-12 17:35:56 -05:00
  • b8ff85efe0 convert : pad vocab size to multiple of 64, not 8 Jared Van Bortel 2024-02-12 16:47:00 -05:00
  • 48a7ef6ebc Nomic BERT Jared Van Bortel 2024-02-08 18:00:44 -05:00
  • c4e6dd59e4 llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478) b2143 Aarni Koskela 2024-02-13 18:18:16 +02:00
  • 037259be68 llama : make load error reporting more granular (#5477) b2142 Aarni Koskela 2024-02-13 15:24:50 +02:00
  • 5c977221d2 iq1_s: slightly faster dot product ik/iq1_s Iwan Kawrakow 2024-02-13 15:18:27 +02:00
  • 263978904c finetune : rename feed-forward tensors (w1/w2/w3) (#4839) b2141 Daniel Bevenius 2024-02-13 14:15:42 +01:00
  • cf45252a7c tests : multi-thread the tokenizer tests (#5474) b2140 Georgi Gerganov 2024-02-13 15:14:22 +02:00
  • f604a17994 iq1_s: Tests Iwan Kawrakow 2024-02-13 15:11:23 +02:00
  • 425c6bbb6c iq1_s: Metal works, but quite slow Iwan Kawrakow 2024-02-13 14:37:16 +02:00
  • 020b548ec3 iq1_s: Metal basics Iwan Kawrakow 2024-02-13 14:16:30 +02:00
  • 03bf161eb6 llama : support batched embeddings (#5466) b2139 Douglas Hanley 2024-02-13 06:06:58 -06:00
  • ad014bba97 make: add error message for bad CUDA version (#5444) b2138 Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • 4be44b7c33 iq1_s: use IQ2_XXS for attn_output Iwan Kawrakow 2024-02-12 18:55:37 +02:00
  • 307c5f617a iq1_s: better grid Iwan Kawrakow 2024-02-12 13:58:16 +02:00
  • 773014926f iq1_s: ARM_NEON dot product. Works, but not very fast Iwan Kawrakow 2024-02-12 11:40:31 +02:00
  • 2ffb05acc8 iq1_s: AVX2 finally works Iwan Kawrakow 2024-02-12 08:29:54 +02:00
  • 67e7c4238e Fix after merge with latest master Iwan Kawrakow 2024-02-12 07:38:29 +02:00
  • dc0b14bebb Fix shadow warnings Iwan Kawrakow 2024-02-12 06:10:06 +02:00
  • 5574533a72 Fix tests Iwan Kawrakow 2024-02-11 18:50:50 +02:00
  • 592b3b26bb iq1_s: WIP AVX2 dot product - something is not right Iwan Kawrakow 2024-02-11 17:22:42 +02:00
  • d94139bf27 iq1_s: scalar CPU dot product Iwan Kawrakow 2024-02-11 14:07:19 +02:00