Commit Graph

  • e91a2224e4 convert-llama-h5-to-gguf.py : n_layer --> n_block klosax 2023-08-13 00:02:44 +02:00
  • 489616e126 convert-gptneox-h5-to-gguf.py : n_layer --> n_block klosax 2023-08-13 00:02:04 +02:00
  • d2ce9cfe8d gguf.py : n_layer --> n_block klosax 2023-08-13 00:01:20 +02:00
  • 8b5f0c5067 constants.py : n_layer --> n_block klosax 2023-08-13 00:00:32 +02:00
  • 5e58ffa1ed gptneox-main.cpp : n_layer --> n_block klosax 2023-08-12 23:50:58 +02:00
  • e606ffeaee convert-llama-h5-to-gguf.py : simplify nbytes klosax 2023-08-12 22:30:35 +02:00
  • f8218477b3 convert-gptneox-h5-to-gguf.py : simplify nbytes klosax 2023-08-12 22:29:35 +02:00
  • 4cef57c81a convert-llama-h5-to-gguf.py : no need to convert tensors twice klosax 2023-08-12 21:50:24 +02:00
  • 8f09157ec9 convert-gptneox-h5-to-gguf.py : no need to convert tensors twice klosax 2023-08-12 21:48:58 +02:00
  • 5d81a715d4 gguf.py : no need to convert tensors twice klosax 2023-08-12 21:45:45 +02:00
  • 60d540831b gguf : roper closing of file M. Yusuf Sarıgöz 2023-08-12 21:42:31 +03:00
  • 202eab04d3 gguf : quantization is working M. Yusuf Sarıgöz 2023-08-12 16:39:05 +03:00
  • 1fc3d30b71 gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 16:09:47 +03:00
  • fa7c39540c gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 15:55:58 +03:00
  • b2571af255 gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 14:28:17 +03:00
  • c4f02b4f74 gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 12:01:17 +03:00
  • 0e1a3c7e7d gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 11:32:34 +03:00
  • 4fa017a1f9 gguf : start implementing quantization (WIP) M. Yusuf Sarıgöz 2023-08-12 10:40:56 +03:00
  • 186c496fdf Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf M. Yusuf Sarıgöz 2023-08-12 07:25:10 +03:00
  • 2f52008b20 gguf : rm references to old file magics M. Yusuf Sarıgöz 2023-08-12 07:24:46 +03:00
  • b19edd54d5 Adding support for llama2.c models (#2559) master-b19edd5 byte-6174 2023-08-11 19:17:25 -04:00
  • 53dc399472 server: fixed wrong variable name in timing json (#2579) master-53dc399 Equim 2023-08-12 06:35:14 +08:00
  • e76c59d524 Update gptneox-main.cpp klosax 2023-08-11 23:09:49 +02:00
  • 2a5ac7af44 Update gguf_tensor_map.py klosax 2023-08-11 23:08:48 +02:00
  • e732423280 gguf : get rid of n_mult, read n_ff from file M. Yusuf Sarıgöz 2023-08-11 23:50:38 +03:00
  • fc60a27642 ci: add linux binaries to release build ci_cublas_linux-fc60a27 Green Sky 2023-05-05 00:01:30 +02:00
  • f44bbd3d88 gguf : rm redundant method M. Yusuf Sarıgöz 2023-08-11 21:00:51 +03:00
  • 7009cf581c gguf : shorter name for member variable M. Yusuf Sarıgöz 2023-08-11 20:43:02 +03:00
  • 61919c1a8f gguf : rm references to old file formats M. Yusuf Sarıgöz 2023-08-11 20:36:11 +03:00
  • d09fd10713 gguf : write metadata in gguf_file_saver M. Yusuf Sarıgöz 2023-08-11 20:07:43 +03:00
  • 781b9ec3f5 gguf : write metadata in gguf_file_saver (WIP) M. Yusuf Sarıgöz 2023-08-11 18:01:26 +03:00
  • 28abfc90fa gguf : write metadata in gguf_file_saver (WIP) M. Yusuf Sarıgöz 2023-08-11 13:27:58 +03:00
  • e3a4960953 gguf : add gguf_get_kv_type M. Yusuf Sarıgöz 2023-08-11 13:03:23 +03:00
  • eb8ca6996f gguf : add gguf_get_kv_type M. Yusuf Sarıgöz 2023-08-11 12:24:08 +03:00
  • b2440f1943 gguf : start implementing gguf_file_saver (WIP) M. Yusuf Sarıgöz 2023-08-11 11:29:50 +03:00
  • a356b0e228 gguf : start implementing gguf_file_saver (WIP) M. Yusuf Sarıgöz 2023-08-11 10:50:02 +03:00
  • e7d346c37c gguf : start implementing gguf_file_saver (WIP) M. Yusuf Sarıgöz 2023-08-11 09:52:01 +03:00
  • 9ca4abed89 Handle ENABLE_VIRTUAL_TERMINAL_PROCESSING more gracefully on earlier versions of Windows. master-9ca4abe DannyDaemonic 2023-08-10 13:11:36 -07:00
  • f316b94c7c gguf : rm deprecated function M. Yusuf Sarıgöz 2023-08-10 20:20:22 +03:00
  • cfb8e35b73 gguf : inference with 7B model working (WIP) M. Yusuf Sarıgöz 2023-08-10 19:56:56 +03:00
  • 42cc04d11d gguf : calculate n_mult M. Yusuf Sarıgöz 2023-08-10 18:49:08 +03:00
  • 22de6c5c4c upd .gitignore M. Yusuf Sarıgöz 2023-08-10 18:09:49 +03:00
  • 4c0f64e302 rm binary commited by mistake M. Yusuf Sarıgöz 2023-08-10 18:07:41 +03:00
  • 4f865181aa gguf : start implementing libllama in GGUF (WIP) M. Yusuf Sarıgöz 2023-08-10 17:49:31 +03:00
  • e59fcb2bc1 Add --n-predict -2 for stopping generation on full context (#2565) master-e59fcb2 Christian Demsar 2023-08-10 10:28:27 -04:00
  • 1c4d8bf981 gguf : start implementing libllama in GGUF (WIP) M. Yusuf Sarıgöz 2023-08-10 16:52:08 +03:00
  • 1638757767 Fix grammar-based sampling issue in server (#2566) master-1638757 Martin Krasser 2023-08-10 12:16:38 +02:00
  • 916a9acdd0 ggml-alloc: Don't try to re-use buffers of external tensors (#2562) master-916a9ac Sam Spilsbury 2023-08-09 23:47:42 +03:00
  • ea04a4ca19 add log_callback to llama_context_params for custom logging. (#2234) master-ea04a4c grahameth 2023-08-09 22:46:40 +02:00
  • 25d43e0eb5 CUDA: tuned mul_mat_q kernels (#2546) master-25d43e0 Johannes Gäßler 2023-08-09 09:42:34 +02:00
  • 0246d0dd6f gptneox-main.cpp : map tensor names klosax 2023-08-09 00:54:21 +02:00
  • 7d5f4522dd convert-llama-h5-to-gguf.py : map tensor names klosax 2023-08-09 00:52:16 +02:00
  • f4d137d98c convert-gptneox-h5-to-gguf.py : map tensor names klosax 2023-08-09 00:50:11 +02:00
  • ece4fc185e map tensor names klosax 2023-08-09 00:48:33 +02:00
  • 28046d1e52 Merge and update server-cfg Henri Vasserman 2023-08-09 00:36:11 +03:00
  • f5bfea0580 Allow passing grammar to completion endpoint (#2532) master-f5bfea0 Martin Krasser 2023-08-08 15:29:19 +02:00
  • acfc5478ff CUDA: tighter VRAM scratch size for 65b/70b (#2551) master-acfc547 Johannes Gäßler 2023-08-08 14:38:16 +02:00
  • 7ed8d1fe7f llm.vim : multiline autocompletion, get rid of "^@" (#2543) chaihahaha 2023-08-08 20:07:02 +08:00
  • e7f94d6fdc vim : bring back simple llm.vim example Georgi Gerganov 2023-08-08 15:05:30 +03:00
  • 2d7baaf50f vim : streaming and more (#2495) AustinMroz 2023-08-08 06:44:48 -05:00
  • 65559a23c8 Update gptneox-main.cpp klosax 2023-08-07 22:28:43 +02:00
  • f3c3b4b167 Add --rope-scale parameter (#2544) master-f3c3b4b klosax 2023-08-07 19:07:19 +02:00
  • 8083ae347a gguf : minor stuff Georgi Gerganov 2023-08-07 19:02:18 +03:00
  • 1da82c551f Merge branch 'master' into gguf Georgi Gerganov 2023-08-07 18:53:03 +03:00
  • 4357e692ac gguf.py : use custom alignment if present klosax 2023-08-07 13:51:26 +02:00
  • 93356bdb7a ggml : mul mat tweaks (#2372) master-93356bd Georgi Gerganov 2023-08-07 14:25:58 +03:00
  • 60baff7c85 ggml : pad result of ggml_nbytes() master-60baff7 Georgi Gerganov 2023-08-07 14:24:42 +03:00
  • 9082b5dfbf ggml : change params pointer (style change) (#2539) master-9082b5d Georgi Gerganov 2023-08-07 13:55:18 +03:00
  • 99d29c0094 ggml : sync (custom ops) (#2537) master-99d29c0 Georgi Gerganov 2023-08-07 13:20:09 +03:00
  • 3d9a551816 Fixed mmap prefetch for GPU offloading (#2529) master-3d9a551 Johannes Gäßler 2023-08-07 10:09:40 +02:00
  • f6f9896ac3 metal : fix out-of-bounds access + inc concurrency nodes (#2416) Georgi Gerganov 2023-08-07 10:52:57 +03:00
  • 34a14b28ff [Makefile] Move ARM CFLAGS before compilation (#2536) master-34a14b2 GiviMAD 2023-08-06 23:21:46 -07:00
  • 7297128db8 [Zig] Rewrite build for Zig 0.11 (#2514) Henri Vasserman 2023-08-07 08:35:53 +03:00
  • 86c3219895 console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) master-86c3219 DannyDaemonic 2023-08-05 23:49:34 -07:00
  • 2e8265ae17 convert.py : add missing abstract methods for quantized data (#2491) Keiichi Tabata 2023-08-06 15:34:05 +09:00
  • f514d1b306 CUDA: faster k-quant mul_mat_q kernels (#2525) master-f514d1b Johannes Gäßler 2023-08-05 18:20:44 +02:00
  • 332311234a fix firefox autoscroll (#2519) master-3323112 Jonas Wunderlich 2023-08-04 20:16:11 +00:00
  • 182af739c4 server: regenerate completion.js.hpp (#2515) master-182af73 Cebtenzzre 2023-08-04 15:00:57 -04:00
  • 4329d1acb0 CUDA: use min compute capability of GPUs actually used (#2506) master-4329d1a Cebtenzzre 2023-08-04 11:35:22 -04:00
  • 02f9d96a86 CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) master-02f9d96 Cebtenzzre 2023-08-04 11:34:32 -04:00
  • 3498588e0f Add --simple-io option for subprocesses and break out console.h and cpp (#1558) master-3498588 DannyDaemonic 2023-08-04 08:20:12 -07:00
  • 5f631c2679 Fixing race condition in server and partial stream handling in frontend. (#2391) master-5f631c2 Stephen Nichols 2023-08-04 06:37:24 -05:00
  • 415e99fec2 Stream save llama context data to file instead of allocating entire buffer upfront (#2488) master-415e99f l3utterfly 2023-08-04 19:29:52 +08:00
  • ff966e7ca6 build : fix several cast and printf warnings (#2499) master-ff966e7 Borislav Stanimirov 2023-08-04 13:07:21 +03:00
  • db5618ad99 cmpnct_gpt2bpe.hpp : comments klosax 2023-08-04 04:57:51 +02:00
  • 278ada9572 gguf.py : bytesarray for gpt2bpe tokenizer klosax 2023-08-04 04:07:57 +02:00
  • fb0b243705 Makefile : remove gptneox-common klosax 2023-08-04 04:02:10 +02:00
  • 5d98989cf6 gpt2 bpe tokenizer (handles merges and unicode) klosax 2023-08-04 03:58:44 +02:00
  • e6f19ba240 gptneox-main.cpp : gpt2 bpe tokenizer klosax 2023-08-04 03:56:37 +02:00
  • 2922280a1a convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer klosax 2023-08-04 03:55:23 +02:00
  • 6691aa8797 Delete gptneox-common.h klosax 2023-08-04 03:52:01 +02:00
  • 23abbe8e00 Delete gptneox-common.cpp klosax 2023-08-04 03:51:43 +02:00
  • 8183159cf3 examples : generate JSON according to schema (#1887) Evan Jones 2023-08-02 22:05:44 -04:00
  • 468ea24fb4 CUDA: faster non k-quant mul_mat_q kernels (#2483) master-468ea24 Johannes Gäßler 2023-08-02 18:04:04 +02:00
  • 4f6b60c776 CUDA: Fix models with output size != 32000 (#2480) master-4f6b60c Johannes Gäßler 2023-08-02 16:48:10 +02:00
  • c5ba5efda2 convert-llama-h5-to-gguf.py : special tokens klosax 2023-08-02 11:26:07 +02:00
  • e1e9b28547 convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens klosax 2023-08-02 11:15:33 +02:00
  • 220d931864 readme : add Aquila-7B model series to supported models (#2487) ldwang 2023-08-02 16:21:11 +08:00
  • c3a65c4bbe gguf-util.h : update note M. Yusuf Sarıgöz 2023-08-02 11:16:23 +03:00
  • cf365fbc20 gguf : gguf counterpart of llama-util.h M. Yusuf Sarıgöz 2023-08-02 11:13:56 +03:00