Commit Graph

  • 81844fbcfd tests : Fix compilation warnings (Linux/GCC) (#2451) master-81844fb Eve 2023-08-02 04:06:19 -04:00
  • a312193e18 readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475) Yiming Cui 2023-08-02 14:18:31 +08:00
  • 1b4f9c8eb9 convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens klosax 2023-08-01 23:40:50 +02:00
  • 49380a23a3 gguf.py : accumulate kv and tensor info data + special tokens klosax 2023-08-01 23:37:48 +02:00
  • ff1cb02397 constants.py : special tokens klosax 2023-08-01 23:17:21 +02:00
  • c574bddb36 fix a typo in examples/server/README.md (#2478) Bono Lv 2023-08-01 20:54:28 +08:00
  • 36a36c32a3 Update gptneox-main.cpp klosax 2023-08-01 14:44:28 +02:00
  • c77fabb1f9 gptneox-main.cpp : special tokens klosax 2023-08-01 14:32:53 +02:00
  • e7a741695c convert-gptneox-h5-to-gguf.py : Special tokens klosax 2023-08-01 14:30:00 +02:00
  • 86aeb27734 server : Support dark mode (#2414) master-86aeb27 ebraminio 2023-08-01 01:56:23 -07:00
  • 1873ff586b metal : add gqa8 kernel to allow llama-2-70B on metal (#2459) Matteo Boschini 2023-08-01 09:43:12 +02:00
  • da4900e835 Update convert-llama-h5-to-gguf.py klosax 2023-07-31 23:04:03 +02:00
  • f3de876a12 fix : update convert-llama-h5-to-gguf.py M. Yusuf Sarıgöz 2023-07-31 23:58:29 +03:00
  • 49e7cb5bb1 CUDA: fixed LLAMA_FAST compilation option (#2473) master-49e7cb5 Johannes Gäßler 2023-07-31 21:02:19 +02:00
  • b772bba42e CUDA: fixed cmake F16 option (#2471) master-b772bba Johannes Gäßler 2023-07-31 19:52:22 +02:00
  • bb42aefaeb gguf : mmap tensor data example M. Yusuf Sarıgöz 2023-07-31 17:46:12 +03:00
  • 0728c5a8b9 CUDA: mmq CLI option, fixed mmq build issues (#2453) master-0728c5a Johannes Gäßler 2023-07-31 15:44:35 +02:00
  • b26f5b2e43 gguf : fix typo in function call M. Yusuf Sarıgöz 2023-07-31 16:23:54 +03:00
  • 1215ed7d5c CUDA: Implemented row flattening for non-glm RoPE (#2468) master-1215ed7 Johannes Gäßler 2023-07-31 14:32:30 +02:00
  • 2dbf518911 CUDA: fewer memory bank conflicts for mul_mat_q (#2458) master-2dbf518 Johannes Gäßler 2023-07-31 13:18:51 +02:00
  • 9d2382b3e4 Fix Metal backend broken from the allocator changes (#2455) master-9d2382b slaren 2023-07-31 11:02:53 +02:00
  • 7aa0a0e7f7 gguf : support custom alignment value M. Yusuf Sarıgöz 2023-07-31 09:59:36 +03:00
  • 6b3a7b9f4f Update convert-llama-h5-to-gguf.py klosax 2023-07-31 03:02:00 +02:00
  • 4f5b6224be Update convert-gptneox-h5-to-gguf.py klosax 2023-07-31 03:00:20 +02:00
  • 2a0914673c Update convert-gptneox-h5-to-gguf.py klosax 2023-07-30 17:31:11 +02:00
  • 068a8e0fbe Update convert-llama-h5-to-gguf.py klosax 2023-07-30 17:29:56 +02:00
  • 30c4ea47e6 add gptneox gguf example klosax 2023-07-30 16:59:26 +02:00
  • 2fabc176ce Update convert-llama-h5-to-gguf.py klosax 2023-07-30 16:28:08 +02:00
  • a113689571 ggml : add graph tensor allocator (#2411) master-a113689 slaren 2023-07-30 15:58:01 +02:00
  • f175b05872 Makefile : add gptneox gguf example klosax 2023-07-30 15:08:37 +02:00
  • e9192b0135 add gptneox gguf example klosax 2023-07-30 15:05:37 +02:00
  • 4ed98bf1ab Update convert-llama-h5-to-gguf.py klosax 2023-07-30 15:01:47 +02:00
  • b19c11750b ggml.c : add gguf_get_arr_n klosax 2023-07-30 14:58:50 +02:00
  • b4676ee447 ggml.h : increase GGML_MAX_NAME to 64 klosax 2023-07-30 14:51:37 +02:00
  • ccd81a751b gguf.py : add layer norm eps and merges klosax 2023-07-30 14:48:14 +02:00
  • 0790c121aa constants.py : add layer norm eps klosax 2023-07-30 14:46:36 +02:00
  • 87c34e4dd4 gguf : update convert-llama-h5-to-gguf.py M. Yusuf Sarıgöz 2023-07-30 01:09:22 +03:00
  • 32e037ffbe gguf : fix set is not subscriptable M. Yusuf Sarıgöz 2023-07-30 01:01:13 +03:00
  • 11f3ca06b8 CUDA: Quantized matrix matrix multiplication (#2160) master-11f3ca0 Johannes Gäßler 2023-07-29 23:04:44 +02:00
  • 9baf9ef304 CUDA: faster multi GPU synchronization (#2448) master-9baf9ef Johannes Gäßler 2023-07-29 23:04:10 +02:00
  • 06c3e4a1a7 Update convert-llama-h5-to-gguf.py klosax 2023-07-29 21:38:01 +02:00
  • 9577821487 gguf.py : support any type klosax 2023-07-29 21:29:07 +02:00
  • 2c22e3bcdb ggml.c : get arr str and f32 klosax 2023-07-29 20:37:47 +02:00
  • 34469b9ea7 ggml.h : get array str and f32 klosax 2023-07-29 20:36:06 +02:00
  • 0f5e57f01d gguf : handle already encoded string M. Yusuf Sarıgöz 2023-07-29 19:56:06 +03:00
  • 8ad7cd49fb Update convert-llama-h5-to-gguf.py klosax 2023-07-29 16:47:00 +02:00
  • 0317c41d98 gguf : upd gguf conversion script M. Yusuf Sarıgöz 2023-07-29 13:31:07 +03:00
  • cc3dd7f042 gguf : write tokenizer data M. Yusuf Sarıgöz 2023-07-29 13:30:22 +03:00
  • 8a76dd8a85 gguf : write tensors one by one M. Yusuf Sarıgöz 2023-07-29 13:17:28 +03:00
  • c861e234f4 gguf : write tensors one by one M. Yusuf Sarıgöz 2023-07-29 12:49:01 +03:00
  • 0c219fb5b5 gguf : fix writing gguf arrays M. Yusuf Sarıgöz 2023-07-29 12:42:54 +03:00
  • 93f7f7aef7 gguf : write tensors one by one and code reuse M. Yusuf Sarıgöz 2023-07-29 12:34:35 +03:00
  • aa99562d70 Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf M. Yusuf Sarıgöz 2023-07-29 12:26:11 +03:00
  • ea5f9ad2ca gguf : fix writing gguf arrays M. Yusuf Sarıgöz 2023-07-29 12:25:43 +03:00
  • 999431c4b6 quick and dirty conversion example klosax 2023-07-29 11:20:05 +02:00
  • d54f53ca51 gguf : add tokenization constants M. Yusuf Sarıgöz 2023-07-29 12:04:45 +03:00
  • 06f423a8e1 gguf : write sample tensors to read M. Yusuf Sarıgöz 2023-07-29 10:26:26 +03:00
  • 08dc8fd884 gguf : do not hardcode tensor names to read M. Yusuf Sarıgöz 2023-07-29 10:24:46 +03:00
  • 9475cdb7a3 Merge branch 'gguf-write-tokenization' into gguf M. Yusuf Sarıgöz 2023-07-29 00:36:35 +03:00
  • 1495735aac gguf : fix writing tensors M. Yusuf Sarıgöz 2023-07-29 00:26:22 +03:00
  • 3492f848d7 gguf : add gguf_find_key (#2438) klosax 2023-07-28 22:45:24 +02:00
  • 8a88e5855c perplexity : add Hellaswag calculation (#2389) master-8a88e58 klosax 2023-07-28 20:25:36 +02:00
  • a9559bf77b ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405) master-a9559bf Lee 2023-07-29 02:17:45 +08:00
  • ee1b497c98 llama : support more diverse tokenizers? (#2420) master-ee1b497 eric8607242 2023-07-29 02:10:05 +08:00
  • d73b8d48b4 examples : fix whitespace Georgi Gerganov 2023-07-28 21:05:08 +03:00
  • 34ae1caf7f examples : server chat mode with llama2 (#2400) nhamanasu 2023-07-29 03:02:10 +09:00
  • dead8f4b5b Fix misaligned memory access in Q4_1 kernel Iwan Kawrakow 2023-07-28 17:27:01 +03:00
  • 72af25998c Fix misaligned memory access in Q4_1 kernel Iwan Kawrakow 2023-07-28 17:12:27 +03:00
  • e5d23f2e7e ggml : fix ARM build + speed-up ggml_mul Georgi Gerganov 2023-07-28 16:31:59 +03:00
  • a4d1eb72c6 ggml : add q4_1 normalized quants Georgi Gerganov 2023-07-28 14:37:52 +03:00
  • d91f3f0c55 readme : fix the description of the Tail free sampling (TFS) method (#2431) Weird Constructor 2023-07-28 10:44:43 +02:00
  • 65cdf34bdc llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) Rand Xie 2023-07-28 01:42:53 -07:00
  • 11ef380c2a GGUF : write tensor (#2426) M. Yusuf Sarıgöz 2023-07-28 11:34:16 +03:00
  • 675425563c ggml : poc for normalizing weights for better quantization Georgi Gerganov 2023-07-27 21:16:10 +03:00
  • 511055722e undo formatting gguf-write-tensor M. Yusuf Sarıgöz 2023-07-28 09:09:14 +03:00
  • edcc7ae7d2 Obtaining LLaMA 2 instructions (#2308) niansa/tuxifan 2023-07-28 03:14:11 +02:00
  • 0c43a3b7d8 gitignore *.gguf M. Yusuf Sarıgöz 2023-07-28 00:07:28 +03:00
  • 8e62d2b214 rm example.gguf M. Yusuf Sarıgöz 2023-07-28 00:06:47 +03:00
  • 62f4926bde fix : fix errors upd writing example M. Yusuf Sarıgöz 2023-07-28 00:04:19 +03:00
  • 7c529cede6 convert.py : Update to support 70B HF format model files (#2427) mj-shifu 2023-07-27 22:39:17 +02:00
  • 9411250564 refactor : rm unused import and upd todos M. Yusuf Sarıgöz 2023-07-27 23:25:47 +03:00
  • bb54d1700e GGUF : Support writing tensors in Python M. Yusuf Sarıgöz 2023-07-27 23:09:53 +03:00
  • 464192b9be WIP: Write tensor M. Yusuf Sarıgöz 2023-07-27 22:25:04 +03:00
  • d2bb3ac10b convert.py : remove GGML vocab + other obsolete stuff Georgi Gerganov 2023-07-27 16:36:35 +03:00
  • 68f53485e4 convert.py : start a new simplified implementation by removing old stuff Georgi Gerganov 2023-07-27 15:56:53 +03:00
  • 158be8f7f4 gguf.py : some code style changes Georgi Gerganov 2023-07-27 15:37:06 +03:00
  • d2b6ca13ad gguf : add array support Georgi Gerganov 2023-07-27 14:53:07 +03:00
  • d89533dff6 gguf : expose the gguf_type enum through the API for now Georgi Gerganov 2023-07-27 11:10:34 +03:00
  • 1a941869cb metal : disable graph concurrency optimization due to bug (#2413) master-1a94186 Georgi Gerganov 2023-07-27 11:00:54 +03:00
  • af1c9966c8 gguf : start write tensor info gguf-python M. Yusuf Sarıgöz 2023-07-27 10:32:31 +03:00
  • c85d3178b3 refactor : reduce code duplication and better API (#2415) M. Yusuf Sarıgöz 2023-07-27 10:29:29 +03:00
  • 8332d26123 refactor: reduce code duplication and better API M. Yusuf Sarıgöz 2023-07-27 09:48:08 +03:00
  • b5472ea0ad ggml : fix assert in ggml_set_unary_op (#2410) master-b5472ea slaren 2023-07-26 23:57:23 +02:00
  • d8491fc7e3 gguf : add comments Georgi Gerganov 2023-07-26 22:56:26 +03:00
  • 5628ec7163 gguf : read / write sample models Georgi Gerganov 2023-07-26 20:04:22 +03:00
  • 6df1f5940f make : build with -Wmissing-prototypes (#2394) master-6df1f59 Cebtenzzre 2023-07-26 14:00:04 -04:00
  • e46870f5af gguf : gguf.c is now part of ggml.c Georgi Gerganov 2023-07-26 18:55:32 +03:00
  • d313c0fa33 gguf : simplify gguf_get_val Georgi Gerganov 2023-07-26 18:53:57 +03:00
  • cb871fa022 gguf : do not support passing existing ggml_context to gguf_init Georgi Gerganov 2023-07-26 18:48:52 +03:00
  • 860c9c63ce gguf : add gguf_get_tensor_name() Georgi Gerganov 2023-07-26 16:36:03 +03:00