Commit Graph

  • 95930da30e convert-hf : get bit-exact same output as ./quantize Francis Couture-Harpin 2024-05-09 11:27:34 -04:00
  • d46dbc76f8 readme : add scheduled server workflow status badge Georgi Gerganov 2024-05-09 16:40:42 +03:00
  • 0961d86604 readme : add app (#6371) l3utterfly 2024-05-09 22:32:40 +09:00
  • 43248e5594 llama3 custom regex split (#6965) b2831 jaime-m-p 2024-05-09 15:30:44 +02:00
  • a743d76a01 CUDA: generalize FP16 fattn vec kernel (#7061) b2830 Johannes Gäßler 2024-05-09 14:32:02 +02:00
  • f31ec120bc Add warning if token is invalid (#7173) Galunid 2024-05-09 14:13:05 +02:00
  • fd9f92b154 llama : update llama_timings.n_p_eval setting (#7160) b2828 Daniel Bevenius 2024-05-09 13:03:29 +02:00
  • fecb81e302 metal : fix ggml_metal_supports_op gg/metal-fattn-reqs Georgi Gerganov 2024-05-09 14:00:35 +03:00
  • 22842164bc gguf-py : add special token modification capability (#7166) Sigbjørn Skjæret 2024-05-09 12:56:00 +02:00
  • 4734524882 opencl : alignment size converted from bits to bytes (#7090) b2826 Albert Jin 2024-05-09 17:34:37 +08:00
  • 1174def5dc metal : fix flash attention kernel requirements Georgi Gerganov 2024-05-09 11:19:27 +03:00
  • 07cd41d096 TypoFix (#7162) Ahmet Zeer 2024-05-09 11:16:45 +03:00
  • 3801db12d8 convert-hf : add missing space after comma Francis Couture-Harpin 2024-05-09 00:08:00 -04:00
  • 59f5a27fc5 gguf-py : flake8 fixes Francis Couture-Harpin 2024-05-08 23:20:51 -04:00
  • 6f8d280073 convert-hf : support bfloat16 conversion Francis Couture-Harpin 2024-05-08 18:18:37 -04:00
  • 4426e2987b cmake : fix typo (#7151) b2824 Jared Van Bortel 2024-05-08 19:55:32 -04:00
  • f98eb31c51 convert-hf : save memory with lazy evaluation (#7075) compilade 2024-05-08 18:16:38 -04:00
  • bc4bba364f Introduction of CUDA Graphs to LLama.cpp (#6766) b2822 agray3 2024-05-08 21:55:49 +01:00
  • 494f70f939 cmake : fix typo ceb/fix-cmake-typo Jared Van Bortel 2024-05-08 16:24:02 -04:00
  • c12452c7ae JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143) b2821 Johannes Gäßler 2024-05-08 21:53:08 +02:00
  • 9da243b36a Revert "llava : add support for moondream vision language model (#6899)" b2820 Georgi Gerganov 2024-05-08 22:14:39 +03:00
  • bd1871fa2b server : add themes + favicon (#6848) JohnnyB 2024-05-08 20:12:06 +01:00
  • 26458af1d6 metal : use vm_allocate instead of posix_memalign on macOS (#7078) b2818 Gilad S 2024-05-08 22:08:10 +03:00
  • bffdaf4010 Merge branch 'master' into compilade/lazy-convert-hf compilade/lazy-convert-hf Francis Couture-Harpin 2024-05-08 10:56:03 -04:00
  • 83330d8cd6 main : add --conversation / -cnv flag (#7108) b2817 Dawid Potocki 2024-05-09 02:32:32 +12:00
  • 465263d0cf sgemm : AVX Q4_0 and Q8_0 (#6891) b2816 Eve 2024-05-08 14:29:23 +00:00
  • 911b3900dd server : add_special option for tokenize endpoint (#7059) b2815 Johan 2024-05-08 14:27:58 +02:00
  • ad211edef5 convert.py : --vocab-only generates false but valid params (#7027) 20kdc 2024-05-08 13:22:32 +01:00
  • 229ffff872 llama : add BPE pre-tokenization for Qwen2 (#7114) b2813 Ren Xuancheng 2024-05-08 20:06:43 +08:00
  • 1fd9c1741d clean up json_value & server_log (#7142) b2812 Xuan Son Nguyen 2024-05-08 13:24:14 +02:00
  • 4cd621c26d convert : add BPE pre-tokenization for DBRX (#7132) b2811 DAN™ 2024-05-08 06:43:23 -04:00
  • 7e0b6a7b3b py : also print the normalizers Georgi Gerganov 2024-05-08 12:47:07 +03:00
  • acdce3cdef compare-llama-bench.py: add missing basicConfig (#7138) b2809 Brian 2024-05-08 18:54:39 +10:00
  • 0fc560fe96 ci : enable git lfs for build.yml gg/lfs Georgi Gerganov 2024-05-08 10:53:02 +03:00
  • db5c2ad30e Revert "tmp : dummy change to trigger ci" Georgi Gerganov 2024-05-08 10:42:25 +03:00
  • 97e40df5d6 tmp : dummy change to trigger ci Georgi Gerganov 2024-05-08 10:42:11 +03:00
  • 837f426f19 ci : try lfs true Georgi Gerganov 2024-05-08 10:30:25 +03:00
  • 9d13776f34 ci : deps before checkout Georgi Gerganov 2024-05-08 10:24:53 +03:00
  • 2c7ff2c7ae ci : add git-lfs Georgi Gerganov 2024-05-08 10:15:36 +03:00
  • 0dc0e9aa42 models : convert vocab files to LFS Georgi Gerganov 2024-05-08 09:54:38 +03:00
  • 3855416027 ggml : introduce bfloat16 support (#6412) b2808 Justine Tunney 2024-05-08 02:30:09 -04:00
  • c0e6fbf8c3 metal : fix unused warning Georgi Gerganov 2024-05-08 09:14:50 +03:00
  • c780e75305 Further tidy on Android instructions README.md (#7077) b2806 Jeximo 2024-05-07 21:26:43 -03:00
  • 48b2f9c1fc Fixed save_imatrix to match old behaviour for MoE (#7099) b2805 jukofyork 2024-05-08 01:24:16 +01:00
  • af0a5b6163 server: fix incorrectly reported token probabilities (#7125) b2804 Johannes Gäßler 2024-05-07 23:07:58 +02:00
  • b6aa670203 Fix OLMo HF to GGUF conversion (#6910) b2803 nopperl 2024-05-07 19:39:43 +00:00
  • 260b7c6529 server : update readme with undocumented options (#7013) Kyle Mistele 2024-05-07 13:44:29 -05:00
  • 53d6c52e22 readme : update hot topics Georgi Gerganov 2024-05-07 21:43:13 +03:00
  • 3af34c1d1b main : update log text (EOS to EOG) (#7104) b2800 RhinoDevel 2024-05-07 19:51:31 +02:00
  • 04976db7a8 docs: fix typos (#7124) omahs 2024-05-07 17:20:33 +02:00
  • 947d3ad27d ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098) Georgi Gerganov 2024-05-07 11:08:49 +03:00
  • 858f6b73f6 Add an option to build without CUDA VMM (#7067) b2797 William Tambellini 2024-05-06 11:12:14 -07:00
  • b3a995b416 flake.lock: Update (#7079) Georgi Gerganov 2024-05-06 18:36:06 +03:00
  • 94e667a9d8 gguf-py : add tqdm as a dependency Francis Couture-Harpin 2024-05-06 09:08:09 -04:00
  • 68c5ac628f Merge branch 'master' into compilade/lazy-convert-hf Francis Couture-Harpin 2024-05-06 08:07:08 -04:00
  • c32d39cefb Merge branch 'master' into compilade/convert-hf-refactor compilade/convert-hf-refactor Brian 2024-05-06 19:33:38 +10:00
  • bcdee0daa7 minor : fix trailing whitespace Georgi Gerganov 2024-05-06 09:31:30 +03:00
  • 62303e7f77 convert-hf : minor changes for consistency Francis Couture-Harpin 2024-05-05 16:49:16 -04:00
  • bc78bf4cdb convert-hf : faster model parts loading Francis Couture-Harpin 2024-05-05 12:35:15 -04:00
  • 628b299106 Adding support for the --numa argument for llama-bench. (#7080) b2794 kunnis 2024-05-05 07:17:47 -05:00
  • 8f8acc8683 Disable benchmark on forked repo (#7034) b2793 Sigbjørn Skjæret 2024-05-05 13:38:55 +02:00
  • ca36326020 readme : add note that LLaMA 3 is not supported with convert.py (#7065) Lyle Dean 2024-05-05 06:21:46 +01:00
  • 889bdd7686 command-r : add BPE pre-tokenization (#7063) b2791 DAN™ 2024-05-05 01:19:30 -04:00
  • 6fbd432211 py : logging and flake8 suppression refactoring (#7081) Brian 2024-05-05 15:07:48 +10:00
  • 98db4347e8 convert-hf : remove einops requirement for InternLM2 Francis Couture-Harpin 2024-05-04 16:52:06 -04:00
  • 0c3833286e convert-hf : flake8 doesn't like lowercase L as a variable name Francis Couture-Harpin 2024-05-04 10:48:18 -04:00
  • f09674fbbd convert-hf : save memory with lazy evaluation Francis Couture-Harpin 2024-05-03 22:00:05 -04:00
  • 215a0d38c8 convert-hf : fix Refact conversion Francis Couture-Harpin 2024-05-04 23:55:42 -04:00
  • 842500144e gguf-split: add --no-tensor-first-split (#7072) b2789 Xuan Son Nguyen 2024-05-04 18:56:22 +02:00
  • cf768b7e71 Tidy Android Instructions README.md (#7016) Jeximo 2024-05-04 13:10:15 -03:00
  • fcd84a0f5a Fix Linux /sys cpu path to guess number of cores (#7064) b2787 viric 2024-05-04 15:26:53 +02:00
  • f2099c50ab convert-hf : align the message logged for converted tensors Francis Couture-Harpin 2024-05-04 09:09:47 -04:00
  • 03fb8a002d If first token generated from the server is the stop word the server will crash (#7038) b2786 maor-ps 2024-05-04 12:06:40 +03:00
  • 92139b90af tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) b2785 Georgi Gerganov 2024-05-04 08:32:32 +03:00
  • 98f2d0e0d7 convert-hf : more consistent formatting of cmdline args Francis Couture-Harpin 2024-05-03 22:04:31 -04:00
  • 3e5e0dced5 Merge branch 'master' into compilade/convert-hf-refactor Francis Couture-Harpin 2024-05-03 16:20:54 -04:00
  • a2ac89d6ef convert.py : add python logging instead of print() (#6511) b2784 Brian 2024-05-04 05:36:41 +10:00
  • 433def286e llama : rename ctx to user_data in progress_callback (#7045) b2783 Daniel Bevenius 2024-05-03 15:24:30 +02:00
  • 6a54973d82 Merge branch 'master' into compilade/convert-hf-refactor Francis Couture-Harpin 2024-05-02 20:02:46 -04:00
  • 60325fa56f Remove .attention from skipped tensors to match more accurately (#7051) b2782 Bartowski 2024-05-02 19:49:09 -04:00
  • 13f4cf70db convert-hf : use a plain class for Model, and forbid direct instantiation Francis Couture-Harpin 2024-05-02 15:50:21 -04:00
  • ce067af118 convert-hf : use an ABC for Model again Francis Couture-Harpin 2024-05-02 15:00:36 -04:00
  • 6ecf3189e0 chore: fix typo in llama.cpp (#7032) b2781 alwqx 2024-05-02 23:56:41 +08:00
  • 644c2696d0 convert-hf : sort model part names Francis Couture-Harpin 2024-05-01 19:16:59 -04:00
  • 639b374b1a convert-hf : convert norms to f32 by default Francis Couture-Harpin 2024-05-01 19:02:34 -04:00
  • b0d943de17 Update LOG_IMPL and LOG_TEE_IMPL (#7029) b2780 Andrew Downing 2024-05-01 17:31:30 -04:00
  • 21068b6bdf convert-hf : display tensor shape Francis Couture-Harpin 2024-05-01 16:59:21 -04:00
  • 8d608a81b7 main : fix off by one error for context shift (#6921) b2779 l3utterfly 2024-05-02 04:27:41 +09:00
  • dcd8dfa1b5 convert : use a string for the SentencePiece tokenizer path Francis Couture-Harpin 2024-05-01 13:07:10 -04:00
  • 3870164f47 convert-hf : allow unusual model part names Francis Couture-Harpin 2024-05-01 12:30:20 -04:00
  • 3ea0d36000 Server: add tests for batch size, different seeds (#6950) Johannes Gäßler 2024-05-01 17:52:55 +02:00
  • 56f60f5d69 convert-hf : flake8 linter doesn't like semicolons Francis Couture-Harpin 2024-05-01 11:36:23 -04:00
  • 1613ef8d8e CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) b2777 Johannes Gäßler 2024-05-01 14:46:37 +02:00
  • c4ec9c0d3d ci : exempt confirmed bugs from being tagged as stale (#7014) b2776 slaren 2024-05-01 07:13:59 +02:00
  • cde9ea65e8 convert-hf : simplify MoE weights stacking Francis Couture-Harpin 2024-04-30 18:12:01 -04:00
  • a8f9b07631 perplexity: more statistics, added documentation (#6936) b2775 Johannes Gäßler 2024-04-30 23:36:27 +02:00
  • 698f0b3479 convert-hf : remove unused n_dims in extra_*_tensors Francis Couture-Harpin 2024-04-30 15:02:34 -04:00
  • c33775bcc7 convert : upgrade to sentencepiece v0.2.0 Francis Couture-Harpin 2024-04-30 15:01:23 -04:00
  • 0d720acb91 Merge branch 'master' into compilade/convert-hf-refactor Francis Couture-Harpin 2024-04-30 14:08:05 -04:00
  • 47e02eb7bc convert-hf : begin refactoring write_tensor Francis Couture-Harpin 2024-04-30 14:07:28 -04:00