Commit Graph

  • 9e2d4dd48e convert-llama-hf-to-gguf.py : fixes klosax 2023-08-17 21:43:48 +02:00
  • 640ddc4259 gguf.py : gptneox mapping klosax 2023-08-17 21:43:10 +02:00
  • b668cd3296 convert-gptneox-hf-to-gguf.py : fixes klosax 2023-08-17 21:42:26 +02:00
  • fc3a523211 gguf.py : write tensors in a single pass (#2644) M. Yusuf Sarıgöz 2023-08-17 21:57:39 +03:00
  • 6a9e6375b5 gguf.py : indentation gguf-write-single-pass Georgi Gerganov 2023-08-17 21:53:15 +03:00
  • 307e09cd85 Merge branch 'gguf' into gguf-write-single-pass Georgi Gerganov 2023-08-17 21:51:15 +03:00
  • e426b3cfc8 gguf.py : fix vertical alignment Georgi Gerganov 2023-08-17 21:50:01 +03:00
  • 5484737d58 llama : fix tensor name grepping during quantization Georgi Gerganov 2023-08-17 21:40:51 +03:00
  • 57eaadb853 llama : throw error if gguf fails to init from file Georgi Gerganov 2023-08-17 21:31:52 +03:00
  • b3cc182990 llama.cpp : typo klosax 2023-08-17 20:27:50 +02:00
  • acaa98234a convert.py : fix HF tensor permuting / unpacking Georgi Gerganov 2023-08-17 21:06:45 +03:00
  • 78e1e57862 quantize-stats.cpp : .bin --> .gguf klosax 2023-08-17 19:18:24 +02:00
  • fb11dd3f92 common.h : .bin --> .gguf klosax 2023-08-17 19:16:35 +02:00
  • e72c8c2124 ggml : fix bug in gguf_set_kv Georgi Gerganov 2023-08-17 20:13:12 +03:00
  • 4dbce7d009 gguf : rm file_type key and method M. Yusuf Sarıgöz 2023-08-17 20:02:38 +03:00
  • 1d93d04ce2 gguf : refactor pth to gguf conversion script M. Yusuf Sarıgöz 2023-08-17 19:58:27 +03:00
  • 899f9a5350 llama : fix lambda capture Georgi Gerganov 2023-08-17 19:49:21 +03:00
  • 93f285bdf1 gptneox : move as a WIP example Georgi Gerganov 2023-08-17 19:38:48 +03:00
  • f71704177f gguf : rename h5 to hf (for HuggingFace) M. Yusuf Sarıgöz 2023-08-17 19:49:15 +03:00
  • 81a2c2a6f4 llama : fix llama_model_loader memory leak Georgi Gerganov 2023-08-17 19:49:02 +03:00
  • 9f02694c91 gguf : refactor gptneox conversion script M. Yusuf Sarıgöz 2023-08-17 19:45:06 +03:00
  • dd9e2fc988 ci : update ".bin" to ".gguf" extension Georgi Gerganov 2023-08-17 19:32:14 +03:00
  • c3b739374e editorconfig : ignore models folder Georgi Gerganov 2023-08-17 19:17:25 +03:00
  • 22c61c5b45 gguf : style fixes in simple conversion script M. Yusuf Sarıgöz 2023-08-17 19:05:43 +03:00
  • 6d66ef96eb Merge branch 'master' into gguf Georgi Gerganov 2023-08-17 19:04:59 +03:00
  • 11bf4366c2 llama : sync with recent PRs on master Georgi Gerganov 2023-08-17 19:03:15 +03:00
  • 2f8fc92d86 gguf : fix conflicts M. Yusuf Sarıgöz 2023-08-17 18:51:14 +03:00
  • 8ace03ad3d convert.py : better always have n_head_kv and default it to n_head Georgi Gerganov 2023-08-17 18:47:06 +03:00
  • d646c4efce convert.py : n_head_kv optional and .gguf file extension klosax 2023-08-17 17:20:36 +02:00
  • dd016cc246 Revert "ci : disable CI temporary to not waste energy" Georgi Gerganov 2023-08-17 17:23:16 +03:00
  • 2ddd9681d6 convert.py : update to support GGUF output Georgi Gerganov 2023-08-17 17:22:43 +03:00
  • e0429d38e4 convert-new.py : output gguf (#2635) Georgi Gerganov 2023-08-17 17:19:52 +03:00
  • 5f97a48fc1 gguf : single pass for writing tensors + refactoring writer M. Yusuf Sarıgöz 2023-08-17 16:57:50 +03:00
  • dce07c3121 gguf : single pass for writing tensors + refactoring writer M. Yusuf Sarıgöz 2023-08-17 16:48:49 +03:00
  • 8dae7ce684 Add --cfg-negative-prompt-file option for examples (#2591) master-8dae7ce Kerfuffle 2023-08-17 07:29:44 -06:00
  • d6fd53afd6 llama.cpp : use ggml_elements() klosax 2023-08-17 15:24:35 +02:00
  • 5a0a2c5685 llama.cpp : print actual model size klosax 2023-08-17 15:18:16 +02:00
  • f31e9230ad gguf : single pass for writing tensors + refactoring writer M. Yusuf Sarıgöz 2023-08-17 15:19:30 +03:00
  • a73ccf1aa3 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) master-a73ccf1 Georgi Gerganov 2023-08-17 10:47:09 +03:00
  • 7cf54e1f74 tests : adds simple llama grammar tests (#2618) master-7cf54e1 drbh 2023-08-17 03:41:01 -04:00
  • a872a2b28e ggml-alloc : fix discrepency between measure&eval (#2639) master-a872a2b Shouzheng Liu 2023-08-17 03:35:53 -04:00
  • 42f8fe1927 examples/gguf : no need to keep q option for quantization any more M. Yusuf Sarıgöz 2023-08-17 08:56:42 +03:00
  • 0919a0f73d cmake : install ggml-meta.metal if LLAMA_METAL (#2449) master-0919a0f Kolen Cheung 2023-08-16 21:09:49 +01:00
  • ed53db86c3 metal : print error of load pipeline state (#2564) Jhen-Jie Hong 2023-08-17 04:09:03 +08:00
  • fc8ef549e5 metal : enable ggml-alloc (#2627) master-fc8ef54 Shouzheng Liu 2023-08-16 16:08:28 -04:00
  • bf83bff674 metal : matrix-matrix multiplication kernel (#2615) master-bf83bff Shouzheng Liu 2023-08-16 16:07:04 -04:00
  • 5ec18934ad convert-new.py : pick #2427 for HF 70B support Georgi Gerganov 2023-08-16 20:16:15 +03:00
  • c8ee87f141 gguf.py : merge all files in gguf.py Georgi Gerganov 2023-08-16 19:55:49 +03:00
  • 88b5769487 gguf : deduplicate (#2629) Georgi Gerganov 2023-08-16 19:25:29 +03:00
  • 758ff1bbb5 llama : refactor model loading code (#2620) Georgi Gerganov 2023-08-16 14:34:03 +03:00
  • ea5615a03a convert-llama-h5-to-gguf.py : clarify the reverse permute klosax 2023-08-16 11:23:15 +02:00
  • 4a1741aa2d gptneox-main.cpp : add tensor data layout klosax 2023-08-15 19:56:19 +02:00
  • 2ae0e985b3 convert-llama-7b-pth-to-gguf.py : add tensor data layout klosax 2023-08-15 19:55:13 +02:00
  • 66756c82af convert-llama-h5-to-gguf.py : add tensor data layout klosax 2023-08-15 19:54:33 +02:00
  • b6056c3db8 gguf.py : add tensor data layout klosax 2023-08-15 19:53:44 +02:00
  • b5ffb2849d scripts : add helper script to get wikitext Georgi Gerganov 2023-08-15 10:04:58 +03:00
  • 2dd5d2c92c convert-llama-h5-to-gguf.py : add 70b gqa support klosax 2023-08-15 00:43:10 +02:00
  • 3ebb00935f server : add missing /json-schema-to-grammar.mjs (#2616) master-3ebb009 Jhen-Jie Hong 2023-08-15 06:14:14 +08:00
  • ca4758290c gguf-llama.cpp : fix n_head_kv klosax 2023-08-14 23:18:41 +02:00
  • ab2cbd03ca convert-llama-7b-pth-to-gguf.py : add token types klosax 2023-08-14 22:10:50 +02:00
  • cedb4870c6 gguf.py : add token types klosax 2023-08-14 22:08:40 +02:00
  • 5d518d421f constants.py : add token types klosax 2023-08-14 22:07:53 +02:00
  • 7ec125b1dc convert-llama-h5-to-gguf.py : add token types klosax 2023-08-14 22:06:33 +02:00
  • 6c63550f63 llama : update tokenizer style Georgi Gerganov 2023-08-14 22:10:19 +03:00
  • 7494c78428 llama : sync gguf-llama with llama (#2613) Georgi Gerganov 2023-08-14 21:33:33 +03:00
  • afc4ca2889 convert : update convert-new.py with tokenizer fixes (#2614) goerch 2023-08-14 19:20:04 +02:00
  • ec1b100720 llama : tokenizer fixes (#2549) goerch 2023-08-14 18:30:28 +02:00
  • 8af3a99ff1 Merge branch 'master' into gguf Georgi Gerganov 2023-08-14 16:39:18 +03:00
  • 6f14854880 gitignore : add gptneox-main Georgi Gerganov 2023-08-14 16:39:02 +03:00
  • d783f7982e metal : return null instead of exit(1) (#2573) master-d783f79 Jhen-Jie Hong 2023-08-14 21:37:39 +08:00
  • d75561df20 server : add --numa support (#2524) master-d75561d Cheng Shao 2023-08-14 15:36:42 +02:00
  • 348acf188c llama : add missing enum keyword in function signatures (#2610) master-348acf1 Kamil Tomšík 2023-08-14 15:35:16 +02:00
  • f00780b2ee llama : sync gguf-llama.cpp with latest llama.cpp (#2608) Georgi Gerganov 2023-08-14 16:28:44 +03:00
  • 6f64b6c0f8 Create convert-llama-7b-pth-to-gguf.py klosax 2023-08-14 13:51:09 +02:00
  • 62490f1380 gguf : use UNIX line ending Georgi Gerganov 2023-08-14 13:04:35 +03:00
  • 0c19ae70d5 simple : minor style changes Georgi Gerganov 2023-08-14 12:56:48 +03:00
  • 5c5a95ba2d gguf.py : dont add empty strings klosax 2023-08-14 11:22:06 +02:00
  • a7d226f871 convert-llama-h5-to-gguf.py : fixes klosax 2023-08-14 11:14:24 +02:00
  • d753dfbcc8 gptneox-main.cpp : tensor name map changes klosax 2023-08-14 10:59:18 +02:00
  • 806a15749d Delete gguf_tensor_map.py klosax 2023-08-14 10:57:19 +02:00
  • 51939d7d1b Create gguf_namemap.py : tensor name map changes klosax 2023-08-14 10:56:59 +02:00
  • 5d22a9db13 convert-gptneox-h5-to-gguf.py : tensor name map changes klosax 2023-08-14 10:55:44 +02:00
  • 1cd06fa25e CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) master-1cd06fa Johannes Gäßler 2023-08-14 10:41:22 +02:00
  • 2feb8934eb server : fix default grammar by use empty string in the UI (#2604) master-2feb893 Jhen-Jie Hong 2023-08-14 16:20:17 +08:00
  • 5517d6e692 server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588) master-5517d6e Jhen-Jie Hong 2023-08-14 15:16:54 +08:00
  • 56a1f32072 Merge branch 'master' into gguf Georgi Gerganov 2023-08-14 10:14:05 +03:00
  • 196b50fee7 gguf : add todos and comments M. Yusuf Sarıgöz 2023-08-14 08:50:47 +03:00
  • f31b539714 Enhance Windows 7 and below compatibility. (#2592) master-f31b539 vxiiduu 2023-08-14 13:59:16 +10:00
  • ee77efea2a test : add simple grammar parsing tests (#2594) master-ee77efe drbh 2023-08-13 10:00:48 -04:00
  • 24f48833ab fix conflicts M. Yusuf Sarıgöz 2023-08-13 16:55:42 +03:00
  • 6beebf3fd9 gptneox-main.cpp : add file_type key klosax 2023-08-13 14:11:01 +02:00
  • 2827b840e4 convert-gptneox-h5-to-gguf.py : add file_type key klosax 2023-08-13 13:54:10 +02:00
  • bf2dad3100 convert : rm quantization version M. Yusuf Sarıgöz 2023-08-13 14:38:53 +03:00
  • 1d60468eee fix conflicts M. Yusuf Sarıgöz 2023-08-13 13:35:40 +03:00
  • 91d4bfd536 convert : write more metadata for LLaMA M. Yusuf Sarıgöz 2023-08-13 13:29:46 +03:00
  • 17800cd80f convert-llama-h5-to-gguf.py : load model in parts to save memory klosax 2023-08-13 12:20:02 +02:00
  • e3d1f07eb1 convert-gptneox-h5-to-gguf.py : load model in parts to save memory klosax 2023-08-13 12:18:34 +02:00
  • 9bf5a7efcb Update gguf_tensor_map.py klosax 2023-08-13 01:27:38 +02:00
  • f64d44a9b9 CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) master-f64d44a Johannes Gäßler 2023-08-13 00:24:45 +02:00
  • c7bd8c147c gptneox-main.cpp : n_layer --> n_block klosax 2023-08-13 00:03:32 +02:00