Commit Graph

  • d94139bf27 iq1_s: scalar CPU dot product Iwan Kawrakow 2024-02-11 14:07:19 +02:00
  • a9d48e9718 iq1_s: CUDA is working Iwan Kawrakow 2024-02-11 13:08:26 +02:00
  • 80cd5bae99 iq1_s: WIP basics Iwan Kawrakow 2024-02-11 11:15:31 +02:00
  • 49cc1f7d67 bert : add tests + fix quantization (#5475) b2137 Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • 99b8b43d7b tests : disable moe test (#5473) b2136 Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • 895407f31b ggml-quants : fix compiler warnings (shadow variable) (#5472) b2135 Kawrakow 2024-02-13 09:07:57 +02:00
  • 4246b71ad7 Fix compiler warnings (shadow variable) ik/fix_warnings Iwan Kawrakow 2024-02-13 08:44:56 +02:00
  • 5a668ea000 metal : trying bs = 512 performance (wip) Georgi Gerganov 2024-02-12 19:21:57 +02:00
  • e8b00e2941 metal : fix NSG1 > 1 Georgi Gerganov 2024-02-08 16:39:38 +02:00
  • 845876d012 metal : works with ne00 % 4 == 0 Georgi Gerganov 2024-02-08 13:26:50 +02:00
  • e68e32548f metal : opts Georgi Gerganov 2024-02-07 23:12:22 +02:00
  • 92a0c17474 metal : initial working version Georgi Gerganov 2024-02-07 11:20:04 +02:00
  • 6875997fd6 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-02-12 21:16:58 +02:00
  • 099afc6274 llama : fix quantization when tensors are missing (#5423) b2134 Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • df334a1125 swift : package no longer use ggml dependency (#5465) b2133 Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • dbd8828eb0 py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • 43fe07c1a4 ggml-sycl: Replace 3d ops with macro (#5458) b2131 Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 4a46d2b792 llava : remove prog parameter from ArgumentParser (#5457) b2130 Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 3b169441df sync : ggml (#5452) b2129 Georgi Gerganov 2024-02-12 09:16:06 +02:00
  • 3bdc4cd0f5 CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) b2128 Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 2891c8aa9a Add support for BERT embedding models (#5423) b2127 Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 97a336507e flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • c88c74f967 vulkan: only use M-sized matmul on Apple GPUs (#5412) b2125 Sergio López 2024-02-11 15:12:00 +01:00
  • a803333a4e common : use enums for sampler types (#5418) b2124 Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • 684780141a server : allow to specify tokens as strings in logit_bias (#5003) b2123 Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • 85910c5b30 main : ctrl+C print timing in non-interactive mode (#3873) b2122 Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • 139b62a839 common : fix compile warning b2121 Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 0f2411f154 ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • a07d0fee1f ggml : add mmla kernels for quantized GEMM (#4966) b2119 snadampal 2024-02-11 07:22:33 -06:00
  • e4640d8fdf lookup: add print for drafting performance (#5450) b2118 Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • 907e08c110 server : add llama2 chat template (#5425) b2117 Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • f026f8120f metal : use autoreleasepool to avoid memory leaks (#5437) b2116 Ian Bull 2024-02-10 02:53:28 -08:00
  • cd9aea63b5 scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • 43b65f5eb8 sync : ggml b2114 Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 4633d93af0 ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • 4b7b38bef5 vulkan: Set limit for task concurrency (#5427) b2112 Neuman Vong 2024-02-10 05:30:19 +11:00
  • e00d2a62dd llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • 7c777fcd5d server : fix prompt caching for repeated prompts (#5420) b2110 Riley Stewart 2024-02-09 02:49:49 -08:00
  • e5ca3937c6 llama : do not cap thread count when MoE on CPU (#5419) b2109 Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • e4124c2477 readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • b2f87cb64d ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) b2107 Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • 44fbe34360 Fix Vulkan crash on APUs with very little device memory (#5424) b2106 0cc4m 2024-02-09 06:52:33 +01:00
  • 8e6a9d2de0 CUDA: more warps for mmvq on NVIDIA (#5394) b2105 Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 41f308f58e llama : do not print "offloading layers" message in CPU-only builds (#5416) b2104 slaren 2024-02-08 21:33:03 +01:00
  • 6e99f2a04f Fix f16_sycl cpy call from Arc (#5411) b2103 Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • ff4ff05c5f llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • b7b74cef36 fix trailing whitespace (#5407) b2101 Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • 4aa43fab56 llama : fix MiniCPM (#5392) b2100 runfuture 2024-02-08 18:36:19 +08:00
  • a6e514a85f llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00
  • 26d4efd11e sampling: fix top_k <= 0 (#5388) b2098 Johannes Gäßler 2024-02-08 09:46:30 +01:00
  • 8504d2d0da tests : .gitignore obj files Georgi Gerganov 2024-02-08 09:46:47 +02:00
  • c4fbb6717c CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) b2096 Michael Podvitskiy 2024-02-07 22:39:23 +01:00
  • 8c933b70c2 fix typo in readme (#5399) Ebey Abraham 2024-02-07 21:11:30 +00:00
  • b906596bb7 Add Ava in the list of llama.cpp UIs (#4362) b2094 Kamil Tomšík 2024-02-07 19:44:52 +01:00
  • aa7ab99be2 CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) b2093 Johannes Gäßler 2024-02-07 12:40:26 +01:00
  • 10afa6f1d1 [SYCL] update install make by w64devkit (#5297) Neo Zhang Jianyu 2024-02-07 18:16:55 +08:00
  • 0ef46da632 llava-cli : always tokenize special tokens (#5382) b2091 Xiao-Yong Jin 2024-02-07 02:17:25 -06:00
  • ee1628bdfe Basic Vulkan Multi-GPU implementation (#5321) b2090 0cc4m 2024-02-07 07:54:50 +01:00
  • ed0bf32290 readme : modernize (#5379) Eve 2024-02-07 06:21:30 +00:00
  • 9a697d842b readme : update ui list (#5354) Ben Williams 2024-02-06 22:16:48 -08:00
  • 316c7faf77 llama : add MiniCPM support (#5346) b2087 runfuture 2024-02-07 14:15:56 +08:00
  • f3e2b4fa3f server : update /props with "total_slots" value (#5373) b2086 Justin Parker 2024-02-07 01:15:19 -05:00
  • f68664ac24 convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • 7286b83d3f BERT WIP ceb/bert Jared Van Bortel 2024-02-06 17:03:12 -05:00
  • 213d1439fa server : remove model.json endpoint (#5371) b2084 Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • 17c97fb062 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) b2083 Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • b08f22c882 Update README.md (#5366) b2082 Kawrakow 2024-02-06 19:00:16 +02:00
  • f57fadc009 Slight quantization improvement for Q4_K and Q5_K (#5361) b2081 Kawrakow 2024-02-06 17:28:02 +02:00
  • 2e9c0bd6b3 readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • 2c516611f1 CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) b2079 Johannes Gäßler 2024-02-06 14:44:06 +01:00
  • 8a79c591de server : include total "num_slots" in props endpoint (#5349) b2078 Justin Parker 2024-02-06 04:20:59 -05:00
  • 31e7903221 server : add dynatemp_range and dynatemp_exponent (#5352) b2077 Michael Coppola 2024-02-06 04:20:00 -05:00
  • 4ffc7a17d4 server : various fixes for the prompt field in /completion (#5300) b2076 Niall Coates 2024-02-06 08:16:23 +00:00
  • 906cff55c2 py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • 098f6d737b make: Use ccache for faster compilation (#5318) b2074 Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • adcf16fd68 py : fix empty bytes arg gg/convert-fix-byte-tokens Georgi Gerganov 2024-02-05 19:53:07 +02:00
  • 78b00dda6c README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • c6b395535a ggml : make use of ggml-quants.h possible in C++ code (#5338) b2072 Kawrakow 2024-02-05 14:09:47 +02:00
  • ded2ad5b88 py : handle byte tokens in get_token_type Georgi Gerganov 2024-02-05 13:42:54 +02:00
  • 91c453fb11 One cannot possibly be defining static_assert in a C++ compilation ik/ggml-quants-cpp Iwan Kawrakow 2024-02-05 13:20:49 +02:00
  • abb61944a5 ggml : avoid duplicating function calls using MIN/MAX macros (#5325) b2071 Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • 89503dcb5f iq3_xxs: quards for the no-imatrix situation (#5334) b2070 Kawrakow 2024-02-05 12:32:27 +02:00
  • 44bf949248 Make use of ggml-quants.h possible in C++ code Iwan Kawrakow 2024-02-05 11:22:10 +02:00
  • 7e1ae372f3 py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • 6fdfa2ecc6 iq2_xxs: tune quantization (#5320) b2068 Kawrakow 2024-02-05 10:46:06 +02:00
  • a2d60c9158 server : allow to get default generation settings for completion (#5307) b2067 Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • e6f8177532 common : add dynamic temperature parameters to main example cli (#5295) b2066 l3utterfly 2024-02-05 17:00:47 +09:00
  • 30679d438d scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • 4be04c8965 scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 5d55b0cd82 readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • 4833ac209d [SYCL] Fix cpy with dims of 3 (#5289) b2062 AidanBeltonS 2024-02-05 07:08:24 +00:00
  • 9392ebd49e flake.lock: Update b2061 github-actions[bot] 2024-02-04 00:17:24 +00:00
  • 49a483e0f2 wip gg/flash-attn-interleave-cc Georgi Gerganov 2024-02-04 12:34:36 +02:00
  • a647257b47 cuda : express strides with helper constants gg/flash-attn-32x8 Georgi Gerganov 2024-02-04 11:08:47 +02:00
  • 1846e92a90 cuda : minor Georgi Gerganov 2024-02-04 09:57:58 +02:00
  • 5ed26e1fc9 Adding some imatrix tools (#5302) b2060 Kawrakow 2024-02-04 10:39:58 +02:00
  • 277fad30c6 cmake : use set() for LLAMA_WIN_VER (#5298) b2059 Welby Seely 2024-02-03 23:18:51 -05:00
  • 3c0d25c475 make: add nvcc info print (#5310) b2058 Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • 3cc5ed353c make: fix nvcc optimization flags for host code (#5309) b2057 Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 60ecf099ed add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00