Commit Graph

  • 60c2ef6d92 metal : utilize view_src to see of tensor is a view Georgi Gerganov 2023-09-04 20:49:09 +03:00
  • ebd3467cc8 metal : more readable kernel Georgi Gerganov 2023-09-04 20:48:46 +03:00
  • 7704db2521 ggml : just in case Georgi Gerganov 2023-09-04 20:48:25 +03:00
  • ad80e5a4a7 llama : add ggml_cont to trigger bug with Metal Georgi Gerganov 2023-09-04 19:46:52 +03:00
  • bd33e5ab92 ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) b1176 slaren 2023-09-04 14:59:52 +02:00
  • c79d130f74 make : fix speculative build speculative-grammar Georgi Gerganov 2023-09-04 15:50:04 +03:00
  • 2db2471c13 speculative : avoid grammar_mem Georgi Gerganov 2023-09-04 15:42:54 +03:00
  • e7dc5b08ac speculative : reuse grammar parser + better logs and comments Georgi Gerganov 2023-09-04 15:18:38 +03:00
  • 6c150d763e speculative : print draft token pieces Georgi Gerganov 2023-09-04 12:54:38 +03:00
  • ebe41d49a6 common : warm-up with 2 tokens - seems to work better Georgi Gerganov 2023-09-03 21:07:01 +03:00
  • 013457885a grammar : remove one nested level Georgi Gerganov 2023-09-03 19:10:43 +03:00
  • 2d89da4f77 grammar : add comments to new grammar file Georgi Gerganov 2023-09-03 18:47:38 +03:00
  • e0a8658e7c grammars : add json_arr.gbnf Georgi Gerganov 2023-09-03 17:52:49 +03:00
  • 69f2fafebc speculative : add grammar support Georgi Gerganov 2023-09-03 15:25:53 +03:00
  • 3103568144 llama-bench : make cpp file non-executable (#2999) b1175 Cebtenzzre 2023-09-04 06:40:18 -04:00
  • 5b8530d88c make : add speculative example (#3003) b1174 Leng Yue 2023-09-04 03:39:57 -07:00
  • e4386f417f server : add a subtle loading animation to the edit box (#2466) b1173 Aarni Koskela 2023-09-04 10:28:55 +02:00
  • 35195689cd 2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985) b1172 Jiahao Li 2023-09-04 14:53:30 +08:00
  • cf9b08485c ggml-alloc : use virtual memory for measurement (#2973) b1171 slaren 2023-09-03 20:34:09 +02:00
  • 47068e5170 speculative : PoC for speeding-up inference via speculative sampling (#2926) b1170 Georgi Gerganov 2023-09-03 15:12:08 +03:00
  • 847896aba7 speculative : add --draft CLI arg speculative Georgi Gerganov 2023-09-03 13:51:07 +03:00
  • a15ca746c7 speculative : print encoding speed Georgi Gerganov 2023-09-03 13:40:42 +03:00
  • c82c808da0 speculative : initial example Georgi Gerganov 2023-09-03 13:34:50 +03:00
  • 8f429fa511 perplexity : fix ETA by warming up the model with an empty run b1169 Georgi Gerganov 2023-09-03 13:42:56 +03:00
  • 6519e9c99c gguf(python): Fix special vocab handling when id < 0 (#2984) Kerfuffle 2023-09-03 04:38:43 -06:00
  • b7f2aa9e51 metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986) Georgi Gerganov 2023-09-03 13:23:33 +03:00
  • 73a12a6344 cov : disable comment in PRs (#2989) Alon 2023-09-03 13:19:01 +03:00
  • 3730134776 llama : fix bpe tokenize from byte (#2889) b1165 opparco 2023-09-03 19:18:09 +09:00
  • d9151e6f57 metal : revert 6af0bab until we fix it Georgi Gerganov 2023-09-03 12:40:56 +03:00
  • afc43d5f82 cov : add Code Coverage and codecov.io integration (#2928) b1163 Alon 2023-09-03 11:48:49 +03:00
  • 6460f758db opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955) b1162 Wentai Zhang 2023-09-03 16:46:44 +08:00
  • ca82cf7bac metal : more optimizations (#2959) Kawrakow 2023-09-03 11:06:22 +03:00
  • 323a9d3b8c llama : fix vocab_only logic when GPU is enabled Georgi Gerganov 2023-09-03 10:39:15 +03:00
  • 99161230c4 llama : enable GPU inference by default with Metal Georgi Gerganov 2023-09-03 10:30:53 +03:00
  • 15f1790a75 make : fix target clean Georgi Gerganov 2023-09-03 10:13:44 +03:00
  • b59beebdbf make : move targets back to the top Georgi Gerganov 2023-09-03 10:04:51 +03:00
  • 4de22829d9 Merge branch 'master' into build-metal-default Georgi Gerganov 2023-09-03 10:03:59 +03:00
  • 6a31a3bd98 swift : add support for k-quants (#2983) kchro3 2023-09-02 23:21:05 -07:00
  • cff7b0bf07 convert.py : BPE fixes (#2938) Kerfuffle 2023-09-02 23:52:13 -06:00
  • 340af42f09 docs : add catai to README.md (#2967) Ido S 2023-09-03 08:50:51 +03:00
  • c42f0ec6b3 examples : fix gpt-neox (#2943) b1157 momonga 2023-09-03 14:36:28 +09:00
  • 2753415afd swift : add missing c file to Package.swift (#2978) kchro3 2023-09-02 22:27:25 -07:00
  • bc054af97a make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886) b1155 Cebtenzzre 2023-09-03 01:26:59 -04:00
  • 3358c381f6 logging: Fix creating empty file even when disabled (#2966) b1154 Kerfuffle 2023-09-02 11:53:55 -06:00
  • 52315a4216 readme : update clblast instructions (#2903) bandoti 2023-09-02 09:53:18 -03:00
  • 8b56b4f2c3 metal : show all Metal device instances in the system (#2952) Karsten Weiss 2023-09-02 14:29:09 +02:00
  • 21f3d1be86 k-quants : fix build on armv7 (android only) (#2920) b1151 Jhen-Jie Hong 2023-09-02 20:23:45 +08:00
  • 571083f508 server : avoid aniprompt in probabilities of final response (#2849) b1150 Jhen-Jie Hong 2023-09-02 08:31:46 +08:00
  • f04d002844 cuda : vsubss4 for older versions of ROCm/clang (#2942) b1149 Engininja2 2023-09-01 15:33:19 -06:00
  • bcf62ba7b4 make : try to fix build on Linux Georgi Gerganov 2023-09-01 17:42:32 +03:00
  • e966ae0574 build : on Mac OS enable Metal by default Georgi Gerganov 2023-08-30 13:11:42 +03:00
  • 69fdbb9abc readme : quick start command fix (#2908) ZHAOKAI WANG 2023-09-01 22:06:44 +08:00
  • 5d6f19f16b Allow quantize to only copy tensors, some other improvements (#2931) b1147 Kerfuffle 2023-09-01 08:02:48 -06:00
  • 0d58936686 llama2c : rename function b1146 Georgi Gerganov 2023-09-01 17:00:40 +03:00
  • 6c9c23429b make : use unaligned vector moves on MinGW (#2945) b1145 Cebtenzzre 2023-09-01 09:53:14 -04:00
  • ee8654bcd0 minor : add const qualifiers (#2853) b1144 m3ndax 2023-09-01 15:47:27 +02:00
  • 49bb9cbe0f docs : add java-llama.cpp to README.md (#2935) Konstantin Herud 2023-09-01 15:36:14 +02:00
  • ef15649972 build : fix most gcc and clang warnings (#2861) b1142 Cebtenzzre 2023-09-01 09:34:50 -04:00
  • d8d6977f48 examples : add C grammar (#2357) Ben Siraphob 2023-09-01 09:32:14 -04:00
  • 5aec2cfaac ggml : add RISC-V vector intrinsics support (#2929) b1140 Tameem 2023-09-01 18:27:40 +05:00
  • 13268c5331 metal : slight speed-up for add and mul kernels (#2917) Georgi Gerganov 2023-09-01 13:42:41 +03:00
  • 4dcd47d71d logs : fix mingw-like builds (fixes #2898) (#2911) b1138 staviq 2023-09-01 11:07:06 +02:00
  • 18705a30ef llama2c : fix segfault and alloc-dealloc-mismatch (#2913) b1137 Cebtenzzre 2023-09-01 05:03:49 -04:00
  • e8d9158925 metal: somewhat faster f16 x f32 matrix multiply kernel (#2951) Kawrakow 2023-09-01 11:15:57 +03:00
  • bce1fef328 convert : fix another python 3.8 issue (#2949) Cebtenzzre 2023-08-31 22:13:51 -04:00
  • 528134dd02 remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906) slaren 2023-09-01 01:32:09 +02:00
  • aeefac4ff7 scripts: Use local gguf package when running from repo (#2927) Kerfuffle 2023-08-31 16:49:24 -06:00
  • e8422de39e @vxiiduu's fix for PrefetchVirtualMemory (#2930) b1132 DannyDaemonic 2023-08-31 04:21:45 -07:00
  • 92d0b751a7 convert : fix python 3.8 support, modernize type annotations (#2916) Cebtenzzre 2023-08-31 01:02:23 -04:00
  • 8afe228000 CUDA: mul_mat_q=true llama_context_params default (#2912) b1130 Johannes Gäßler 2023-08-30 21:46:19 +02:00
  • 8c2b881281 cuda : poc for norm quants (only -b 1 works) norm-quants Georgi Gerganov 2023-08-30 21:39:49 +03:00
  • ced231980e Remove warning which fails on windows. master-ced2319 Adam Treat 2023-08-30 14:33:31 -04:00
  • df54d2f1d4 ggml : use less ggml_mul tasks when src0 rows are few Georgi Gerganov 2023-08-30 19:37:26 +03:00
  • 71d6975559 [Docker] fix tools.sh argument passing. (#2884) Henri Vasserman 2023-08-30 19:14:53 +03:00
  • 253eab8ae1 ggml : poc for normalizing weights for better quantization (metal) Georgi Gerganov 2023-08-30 19:05:36 +03:00
  • b4e70822f6 metal : add poc for normalized Q4_0 and Q4_1 norm-quants-rebase Georgi Gerganov 2023-08-30 18:32:43 +03:00
  • 9ffe54ed10 Merge branch 'master' into norm-quants Georgi Gerganov 2023-08-30 16:26:59 +03:00
  • 4cdaa3c9cb Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. niansa 2023-06-22 12:58:07 +02:00
  • b532a69b2f convert.py : use dir name to name the llama Georgi Gerganov 2023-08-30 13:29:40 +03:00
  • c90d135eb4 examples : fix underscore in beam-search + .gitignore (close #2900) b1127 Georgi Gerganov 2023-08-30 12:52:46 +03:00
  • 0d1c706181 gguf : add workflow for Pypi publishing (#2896) b1126 M. Yusuf Sarıgöz 2023-08-30 12:47:40 +03:00
  • 9509294420 make : add test and update CI (#2897) b1125 alonfaraj 2023-08-30 12:42:51 +03:00
  • 35092fb547 docs : add node-llama-cpp to README.md (#2885) Gilad S 2023-08-30 11:40:12 +03:00
  • 488e03200e Merge branch 'master' into gguf-publish-ci gguf-publish-ci M. Yusuf Sarıgöz 2023-08-30 11:34:55 +03:00
  • dc07dc492e convert : various script cleanups/fixes + merges and special token handling (#2842) Kerfuffle 2023-08-30 02:25:50 -06:00
  • 3303f38f34 fix trailing whitespace M. Yusuf Sarıgöz 2023-08-30 11:00:45 +03:00
  • 4d277cb563 gguf : add workflow for Pypi publishing M. Yusuf Sarıgöz 2023-08-30 10:56:41 +03:00
  • b8e572f6d3 gguf : add workflow for Pypi publishing M. Yusuf Sarıgöz 2023-08-30 10:52:06 +03:00
  • ad9ddcff6e llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) chaihahaha 2023-08-30 14:50:55 +08:00
  • 8341a25957 main : log file (#2748) b1121 staviq 2023-08-30 08:29:32 +02:00
  • 849408957c tests : add a C compliance test (#2848) b1120 Cebtenzzre 2023-08-30 02:20:26 -04:00
  • 06abf8eeba ggml : add view_src and view_offs to ggml_tensor for views (#2874) b1119 slaren 2023-08-29 23:24:42 +02:00
  • c03a243abf remove outdated references to -eps and -gqa from README (#2881) slaren 2023-08-29 23:17:34 +02:00
  • fa3582f509 Tell users attmepting to run perplexity with too few tokens to use more (#2882) b1117 Kawrakow 2023-08-29 23:55:45 +03:00
  • e37e69dcc3 10X faster BPE tokenizer (#2876) b1116 Kawrakow 2023-08-29 23:55:03 +03:00
  • 53885d7256 py : fix "usage" messages (#2873) maddes8cht 2023-08-29 15:51:02 +02:00
  • bcce96ba4d convert.py : fix baichuan7B support (#2870) jameswu2014 2023-08-29 17:48:41 +08:00
  • 74e0caeb82 readme : add react-native binding (#2869) Jhen-Jie Hong 2023-08-29 17:30:10 +08:00
  • d4b5e16c32 make : fix clang tests build, add missing examples (#2859) b1112 Cebtenzzre 2023-08-29 04:42:41 -04:00
  • 3a007648f2 metal : add option to disable debug logs (close #2764) b1111 Georgi Gerganov 2023-08-29 11:33:46 +03:00