Commit Graph

  • 22a648f8cc Merge branch 'master' into pr/7359 Georgi Gerganov 2024-07-04 16:41:27 +03:00
  • 9971c38ada llama : do not print hparams for vocab-only models Georgi Gerganov 2024-07-04 16:39:02 +03:00
  • b59ddf945e llama : fix save/load state Georgi Gerganov 2024-07-04 15:55:23 +03:00
  • 29ab5a0ed1 llama : use std::array for per-layer hparams Georgi Gerganov 2024-07-04 15:35:15 +03:00
  • f8c4c0738d tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) b3294 Daniel Bevenius 2024-07-04 12:53:42 +02:00
  • 402d6feffa llama : suppress unref var in Windows MSVC (#8150) b3293 Daniel Bevenius 2024-07-04 12:50:57 +02:00
  • 977941d9fe imitate reshape bug of python code caitianchi 2024-07-04 17:25:02 +08:00
  • 20fc3804bf convert : fix gemma v1 tokenizer convert (#8248) b3292 Georgi Gerganov 2024-07-04 10:41:03 +03:00
  • f619024764 [SYCL] Remove unneeded semicolons (#8280) b3291 AidanBeltonS 2024-07-04 02:07:19 +01:00
  • d23287f122 Define and optimize RDNA1 (#8085) b3290 Daniele 2024-07-03 23:02:58 +00:00
  • 5f2d4e60e2 ppl : fix n_seq_max for perplexity (#8277) b3289 slaren 2024-07-03 19:33:31 +02:00
  • dcab343f2f use 1 seq for kl_divergence sl/fix-ppl-seq-max slaren 2024-07-03 16:22:58 +02:00
  • 5cf23d11c8 ppl : fix n_seq_max for perplexity slaren 2024-07-03 16:10:36 +02:00
  • 916248af1f fix phi 3 conversion (#8262) Xuan Son Nguyen 2024-07-03 16:01:54 +02:00
  • f8d6a23804 fix typo (#8267) b3287 Judd 2024-07-03 20:40:16 +08:00
  • fadde67135 Dequant improvements rebase (#8255) b3286 AidanBeltonS 2024-07-03 02:55:34 +01:00
  • a27152b602 fix: add missing short command line argument -mli for multiline-input (#8261) b3285 MistApproach 2024-07-02 22:56:46 +02:00
  • 3e2618bc7b Adding step to clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257) b3284 Clint Herron 2024-07-02 13:19:56 -04:00
  • 703764a382 convert : use non-fast T5 tokenizer fairydreaming/t5-clean-3-gg Georgi Gerganov 2024-07-02 19:29:26 +03:00
  • 07a3fc0608 Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) b3283 Clint Herron 2024-07-02 12:18:10 -04:00
  • 968967376d Add JAIS model(s) (#8118) b3282 Faisal Zaghloul 2024-07-02 10:36:00 -04:00
  • 17bb0eaec3 llama : UGM tokenizer init with UNK tokens instead of PAD Georgi Gerganov 2024-07-02 10:40:14 +03:00
  • 9eb5d5617d convert : add t5 tokenizer tests Georgi Gerganov 2024-07-02 10:39:49 +03:00
  • 023b8807e1 convert-hf : print output file name when completed (#8181) Daniel Bevenius 2024-07-02 08:40:49 +02:00
  • 0e0590adab cuda : update supports_op for matrix multiplication (#8245) b3280 slaren 2024-07-02 08:39:38 +02:00
  • a9f3b10215 [SYCL] Fix win build conflict of math library (#8230) b3279 luoyu-intel 2024-07-02 04:50:07 +00:00
  • d08c20edde [SYCL] Fix the sub group size of Intel (#8106) b3278 luoyu-intel 2024-07-02 02:16:00 +00:00
  • 5fac350b9c Fix gemma2 tokenizer convert (#8244) Xuan Son Nguyen 2024-07-02 01:07:23 +02:00
  • e3e33c0cbc llama : minor spacing changes compilade 2024-07-01 15:23:02 -04:00
  • cb5fad4c6c CUDA: refactor and optimize IQ MMVQ (#8215) b3276 Johannes Gäßler 2024-07-01 20:39:06 +02:00
  • dae57a1ebc readme: add Paddler to the list of projects (#8239) Mateusz Charytoniuk 2024-07-01 19:13:22 +02:00
  • 49122a873f gemma2: add sliding window mask (#8227) b3274 Xuan Son Nguyen 2024-07-01 18:48:34 +02:00
  • 0ddeff1023 readme : update tool list (#8209) b3273 Roni 2024-07-01 14:48:16 +02:00
  • 3840b6f593 nix : enable curl (#8043) Michael Francis 2024-07-01 07:47:04 -04:00
  • 257f8e41e2 nix : remove OpenCL remnants (#8235) Georgi Gerganov 2024-07-01 14:46:18 +03:00
  • d4a1923d4e minor : remove parentheses gg/nix-remove-opencl Georgi Gerganov 2024-07-01 14:45:55 +03:00
  • 694c59cb42 Document BERT support. (#8205) iacore 2024-07-01 11:40:58 +00:00
  • 197fe6c1d7 [SYCL] Update SYCL-Rope op and Refactor (#8157) b3269 zhentaoyu 2024-07-01 19:39:06 +08:00
  • 32cd6f5748 nix : remove OpenCL remnants Georgi Gerganov 2024-07-01 13:49:44 +03:00
  • c8cdb48d10 llama : support all OpenELM models Francis Couture-Harpin 2024-06-30 23:13:48 -04:00
  • d0a7145ba9 flake.lock: Update (#8218) b3268 Georgi Gerganov 2024-07-01 02:09:34 +03:00
  • 51b2577dd4 Merge branch 'master' into openelm Francis Couture-Harpin 2024-06-30 16:22:07 -04:00
  • 10c3c419e9 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-30 15:31:25 -04:00
  • db2ffd519d llama : fix mpt and olmo pre-tokenizer Francis Couture-Harpin 2024-06-30 14:34:55 -04:00
  • 9ef0780062 Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203) b3267 Xuan Son Nguyen 2024-06-30 20:27:13 +02:00
  • 1c5eba6f8e llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) b3266 Andrei 2024-06-29 20:44:08 -07:00
  • 51f0bd50a1 Remove custom pre attention scaling and use computed value instead. add-gemma2-soft-capping Andrei Betlen 2024-06-29 23:02:50 -04:00
  • 6dc9eb4040 llama : quantization-related fixes for T5 fairydreaming/t5-clean-3 Stanisław Szymczyk 2024-06-29 18:09:22 +02:00
  • a89427908d Add custom kq scaling from Gemma2Attention Andrei Betlen 2024-06-29 10:17:33 -04:00
  • 6f2464e3dd Merge branch 'add-gemma2-soft-capping' of github.com:ggerganov/llama.cpp into add-gemma2-soft-capping Andrei Betlen 2024-06-29 01:11:17 -04:00
  • bb7159927d Add default value for attention and final logit softcap value Andrei Betlen 2024-06-29 01:10:55 -04:00
  • 8edf73a729 Merge branch 'master' of github.com:ggerganov/llama.cpp into add-gemma2-soft-capping Andrei Betlen 2024-06-29 00:59:58 -04:00
  • 8fbd59308b ggml-quants : attempt to fix Arm 32-bit support Francis Couture-Harpin 2024-06-28 22:52:57 -04:00
  • ec50944bf6 ggml-quants : fix build failure on Windows Francis Couture-Harpin 2024-06-28 20:41:13 -04:00
  • bfd2f21fb4 bitnet : replace 1.58b with b1.58, as in the paper Francis Couture-Harpin 2024-06-28 20:38:12 -04:00
  • 72272b83a3 fix code typo in llama-cli (#8198) b3265 Xuan Son Nguyen 2024-06-29 00:14:20 +02:00
  • 3a2471811f Update src/llama.cpp Andrei 2024-06-28 16:07:47 -04:00
  • f4424c150f Disable flash attention for Gemma2 Andrei Betlen 2024-06-28 16:00:20 -04:00
  • d1137c20f1 Add custom add_ functions Andrei Betlen 2024-06-28 15:58:02 -04:00
  • d3d3c4eb35 fix Andrei Betlen 2024-06-28 15:46:45 -04:00
  • 4d3f17b4ac Add attention and final logit softcapping. Andrei Betlen 2024-06-28 15:42:19 -04:00
  • 8748d8ac6f json: attempt to skip slow tests when running under emulator (#8189) b3264 Olivier Chafik 2024-06-28 18:02:05 +01:00
  • 26a39bbd6b Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172) b3263 Xuan Son Nguyen 2024-06-28 15:11:44 +02:00
  • 712e4d9450 Generate full token count during warm up codeplay/tg-warmup Joe Todd 2024-06-28 13:29:00 +01:00
  • 38373cfbab Add SPM infill support (#8016) b3262 Sigbjørn Skjæret 2024-06-28 12:53:43 +02:00
  • b851b3fba0 cmake : allow user to override default options (#8178) b3261 slaren 2024-06-28 12:37:45 +02:00
  • 139cc621e9 json: restore default additionalProperties to false, fix some pattern escapes (#8180) b3260 Olivier Chafik 2024-06-28 09:26:45 +01:00
  • e57dc62057 llama: Add support for Gemma2ForCausalLM (#8156) b3259 pculliton 2024-06-28 00:00:43 -04:00
  • a27aa50ab7 Add missing items in makefile (#8177) b3258 Xuan Son Nguyen 2024-06-28 02:19:11 +02:00
  • cb0b06a8a6 json: update grammars/README w/ examples & note about additionalProperties (#8132) Olivier Chafik 2024-06-27 22:08:42 +01:00
  • 558f44bf83 CI: fix release build (Ubuntu+Mac) (#8170) b3256 loonerin 2024-06-27 15:01:23 -04:00
  • 8172ee9da9 cmake : fix deprecated option names not working (#8171) slaren 2024-06-27 20:04:39 +02:00
  • 16791b8f0b Add chatml fallback for cpp llama_chat_apply_template (#8160) b3254 Xuan Son Nguyen 2024-06-27 18:14:19 +02:00
  • ab3679112d flake.lock: Update (#8071) Georgi Gerganov 2024-06-27 18:37:29 +03:00
  • 97877eb10b Control vector loading fixes (#8137) b3252 jukofyork 2024-06-27 15:48:07 +01:00
  • 387952651a Delete examples/llama.android/llama/CMakeLists.txt (#8165) Raj Hammeer Singh Hada 2024-06-27 20:09:29 +05:30
  • 6030c61281 Add Qwen2MoE 57B-A14B model identifier (#8158) b3250 Sigbjørn Skjæret 2024-06-27 16:27:41 +02:00
  • 85a267daaa CUDA: fix MMQ stream-k for --split-mode row (#8167) b3249 Johannes Gäßler 2024-06-27 16:26:05 +02:00
  • f675b20a3b Added support for Viking pre-tokenizer (#8135) b3248 kustaaya 2024-06-27 11:58:54 +03:00
  • 7d7fff4654 llama : whitespace formatting Stanisław Szymczyk 2024-06-27 10:13:53 +02:00
  • 911e35bb8b llama : fix CodeLlama FIM token checks (#8144) Sigbjørn Skjæret 2024-06-27 09:46:41 +02:00
  • 7293243d4f Merge remote-tracking branch 'upstream/master' into t5-clean-3 Stanisław Szymczyk 2024-06-27 09:29:26 +02:00
  • 0996149911 convert-hf : allow converting the weird BitNet 1.3B Francis Couture-Harpin 2024-06-26 22:10:12 -04:00
  • 961e293833 convert-hf : simplify BitNet pre-quantization Francis Couture-Harpin 2024-06-26 16:24:40 -04:00
  • 89dc3b254c ggml-quants : use ceiling division when quantizing q1_3 Francis Couture-Harpin 2024-06-26 15:31:48 -04:00
  • 9465ec6e12 ggml-quants : ARM NEON vec_dot for q2_2 and q1_3 Francis Couture-Harpin 2024-06-25 01:32:14 -04:00
  • 638ad52f87 ggml-quants : cleanup Q1_3 code formatting Francis Couture-Harpin 2024-06-23 19:44:09 -04:00
  • ef1e345c85 ggml-quants : Q2_2 now faster than Q4_K on with AVX2 Francis Couture-Harpin 2024-06-19 22:12:43 -04:00
  • 48b73b8498 ggml-quants : substract 1 when back in epi8 Francis Couture-Harpin 2024-06-19 17:50:34 -04:00
  • 7ef4254a92 ggml-quants : faster 1.625 bpw AVX2 vec_dot Francis Couture-Harpin 2024-06-19 14:34:32 -04:00
  • bd807499f7 ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b Francis Couture-Harpin 2024-06-19 12:21:08 -04:00
  • ac146628e4 Fix llama-android.cpp for error - "common/common.h not found" (#8145) b3246 Raj Hammeer Singh Hada 2024-06-27 07:27:57 +05:30
  • 9b31a40c6d clip : suppress unused variable warnings (#8105) b3245 Daniel Bevenius 2024-06-27 01:50:09 +02:00
  • c70d117c37 scripts : fix filename sync Georgi Gerganov 2024-06-26 23:25:22 +03:00
  • ae5d0f4b89 ci : publish new docker images only when the files change (#8142) b3243 slaren 2024-06-26 21:59:28 +02:00
  • 31ec3993f6 ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140) b3242 slaren 2024-06-26 21:34:14 +02:00
  • c7ab7b612c make : fix missing -O3 (#8143) b3241 slaren 2024-06-26 20:20:22 +02:00
  • f2d48fffde sync : ggml b3240 Georgi Gerganov 2024-06-26 19:39:19 +03:00
  • 4713bf3093 authors : regen Georgi Gerganov 2024-06-26 19:36:44 +03:00
  • 0e814dfc42 devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139) Georgi Gerganov 2024-06-26 19:32:07 +03:00