Commit Graph

  • 007489e895 Fix phi3 chat template confusion with zephyr (#7449) b2984 Tristan Druyen 2024-05-23 16:15:15 +02:00
  • 7573b634a7 Update README.md Hongji Zhu 2024-05-23 22:09:41 +08:00
  • a491f45cbc change name in readme caitianchi 2024-05-23 21:44:37 +08:00
  • ec1cea7182 add instructions in readme caitianchi 2024-05-23 21:41:11 +08:00
  • 0480d5faa2 add android readme caitianchi 2024-05-23 21:24:03 +08:00
  • 8b94e799df readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
  • 3015851c5a llama : add getters for n_threads/n_threads_batch (#7464) b2982 Daniel Bevenius 2024-05-23 14:29:26 +02:00
  • 55ac3b7aea ci : use Pythia models instead of OpenLlama (#7470) b2981 Georgi Gerganov 2024-05-23 15:28:14 +03:00
  • dacfcebd60 readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
  • 2b9190344e add run android for termux in readme caitianchi 2024-05-23 20:11:44 +08:00
  • c536fa6ef9 rename caitianchi 2024-05-23 20:00:45 +08:00
  • 7a49a6f6dc init caitianchi 2024-05-23 19:28:47 +08:00
  • 9b82476ee9 Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) b2979 fairydreaming 2024-05-23 11:49:53 +02:00
  • a61a94e543 llama : rename n_ctx -> cache.size, less confusing (#0) b2978 Georgi Gerganov 2024-05-23 12:38:18 +03:00
  • 152da28ae5 labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
  • d48c88cbd5 ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) b2976 Georgi Gerganov 2024-05-23 10:00:44 +03:00
  • e84b71c2c6 ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00
  • 1b1e27cb49 Update vulkan rope implementation to support frequency factors (#7475) b2974 0cc4m 2024-05-23 08:59:59 +02:00
  • fbf777d2b9 main : minor (#7462) b2973 Georgi Gerganov 2024-05-23 09:43:24 +03:00
  • c5fe1d6cdc gguf-py : remove unused import compilade/gguf-py-fix-q-shape Francis Couture-Harpin 2024-05-23 00:09:49 -04:00
  • 2ff601fc32 gguf-py : fix and simplify quantized shape round-trip Francis Couture-Harpin 2024-05-22 23:40:41 -04:00
  • 518b75260b cuda uma test sl/cuda-uma slaren 2024-05-23 03:13:48 +02:00
  • cd93a28cb1 CUDA: fix FA out-of-bounds reads (#7479) b2972 Johannes Gäßler 2024-05-23 00:31:20 +02:00
  • 3b57b55c6f Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-22 15:34:24 -04:00
  • 8334b5becb gguf-py : do not use internal numpy types Francis Couture-Harpin 2024-05-22 14:29:50 -04:00
  • 1e374365d1 SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • 197ff91462 build : remove zig (#7471) b2970 Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 6ff13987ad common : normalize naming style (#7462) b2969 Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • 38c03478a3 CUDA: fix FA out-of-bounds writes (#7465) b2968 Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • b18532a4ef phi3 : duplicate rope factors in each layer (#7447) b2967 slaren 2024-05-22 16:10:46 +02:00
  • fcda1128bc vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) b2966 k.h.lai 2024-05-22 20:53:21 +08:00
  • 03d8900ebe llama : add missing model type names (#7445) b2965 Justine Tunney 2024-05-22 07:08:18 -04:00
  • 9b3d833189 cuda : fix compile warning (#7454) b2964 Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 95fb0aefab CUDA: remove incorrect precision check (#7454) b2963 Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 3e5faa8503 cuda : fix rope + add tests (#7452) b2962 Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • e9095e6098 async direct io per tensor test sl/dio-test slaren 2024-05-22 01:08:52 +02:00
  • 201cc11afa llama : add phi3 128K model support (#7225) b2961 liuwei-git 2024-05-22 04:28:32 +08:00
  • 6369bf0433 metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • e402de364b grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • 46db3506aa address review comments Pavel Fatin 2024-05-21 20:05:26 +02:00
  • fcf6538ba6 CUDA: fix unused warning in mmq.cu (#7442) b2958 Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • c3f8d58356 tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • 11474e756d examples: cache hf model when --model not provided (#7353) b2956 Amir 2024-05-21 17:13:12 +03:00
  • d8ee902227 CUDA: deduplicate mmq code (#7397) b2955 Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • d7e852c1bc Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • 1b17ed7ab6 Direct I/O and Transparent HugePages Pavel Fatin 2024-05-20 21:55:33 +02:00
  • 917dc8cfa6 Tokenizer SPM fixes for phi-3 and llama-spm (#7375) b2953 jaime-m-p 2024-05-20 20:15:57 +02:00
  • fabf30b4c4 llama : remove Persimmon (#7408) b2952 Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • 20385cebcc perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • a041ced0fd wip gg/kv-determinism Georgi Gerganov 2024-05-20 17:00:55 +03:00
  • db10f01310 rpc : track allocated buffers (#7411) b2950 Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • 3bc10cb485 server : fix temperature + disable some tests (#7409) b2949 Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 6bf9b66fa3 [SYCL] Update SYCL upscale operation (#7321) b2948 AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 26cd4237bc Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • 213e90ed73 ggml-opencl, llama: using reserve() if count already known (#7272) b2946 Herman Semenov 2024-05-20 07:33:21 +00:00
  • 65c58207ec ggml : add loongarch lsx and lasx support (#6454) b2945 junchao-loongson 2024-05-20 15:19:21 +08:00
  • 1cc0155d04 server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • e932094d58 server : return error on too large embedding input (#7389) b2943 Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • 2789baf480 tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • 33c8d50acc Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) b2941 Srihari-mcw 2024-05-19 19:18:39 -07:00
  • d359f30921 llama : remove MPI backend (#7395) b2940 slaren 2024-05-20 01:17:03 +02:00
  • 1ea2a0036e quantize : fix --keep-split check (#7374) b2939 Fred Douglas 2024-05-19 11:37:04 -05:00
  • f030ec1f7a Vulkan Embedding Fix (#7360) b2938 0cc4m 2024-05-19 17:19:53 +02:00
  • e4e6f67be6 ggml : fix another case of quants nans (#7387) b2937 slaren 2024-05-19 17:08:46 +02:00
  • 5ca49cbecd ggml: implement quantized KV cache for FA (#7372) b2936 Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • 1b01f06db0 server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 41858392e1 server: fix seed being reported back (#7382) b2934 Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 6aade19ee7 Add StableLM2 pre-tokenizer (#7349) b2933 Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • ab33f7a338 cuda : clear error after buffer allocation failure (#7376) b2932 slaren 2024-05-19 14:19:37 +02:00
  • e23b974f4c labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • 854d365aba cmake : update android comments (#7341) b2930 Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • f5bf761747 Capture CUDA logging output (#7298) b2929 fraxy-v 2024-05-19 01:44:42 +03:00
  • 059031b8c4 ci : re-enable sanitizer runs (#7358) b2928 Georgi Gerganov 2024-05-18 18:55:54 +03:00
  • 511182eabb android : use "ci-android" branch for CI (#7341) b2927 Georgi Gerganov 2024-05-18 13:40:39 +03:00
  • 133d99c599 CUDA: deduplicate FlashAttention code (#7352) b2926 Johannes Gäßler 2024-05-18 12:36:25 +02:00
  • cb42c29427 server: correct --threads documentation [no ci] (#7362) Johannes Gäßler 2024-05-18 11:10:47 +02:00
  • 60b2e1b9c5 fixup! Initial OpenELM support (270M only so far) Icecream95 2024-05-18 20:19:10 +12:00
  • d233b507cd cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) Engininja2 2024-05-18 02:05:17 -06:00
  • 0f98acfac6 llama : add support for larger Granite Code Models (20B, 34B) (#7324) b2923 Steffen Röcker 2024-05-18 10:04:55 +02:00
  • ca57e0f35e perplexity : ndot progress and show stats with < 100 tasks (#7348) b2922 strawberrymelonpanda 2024-05-18 00:57:08 -07:00
  • aaabe2e361 Fill out missing entries in llama_model_type_name Icecream95 2024-05-18 19:56:14 +12:00
  • 007f2ece0a cmake : provide binary dir ci-android Georgi Gerganov 2024-05-18 10:50:46 +03:00
  • 217d8d7b77 Initial OpenELM support (270M only so far) Icecream95 2024-05-18 19:41:42 +12:00
  • 99d1e7eb8a android : do not fetch, use add_subdirectory instead Georgi Gerganov 2024-05-18 09:32:24 +03:00
  • c1b295eea5 Update and fix Vulkan soft_max and argsort implementations (#7237) b2921 0cc4m 2024-05-18 08:10:58 +02:00
  • de73196344 github-actions-labeler: initial commit (#7330) Brian 2024-05-18 16:04:23 +10:00
  • b49a13dd2f convert : fix set_vocab_sentencepiece (#6866) Georgi Gerganov 2024-05-18 08:46:20 +03:00
  • 05834841dc ggml : fix quants nans when all the group weights are very close to zero (#7313) b2918 slaren 2024-05-18 02:39:54 +02:00
  • ef277de2ad cmake : fix typo in AMDGPU_TARGETS (#7356) b2917 Engininja2 2024-05-17 18:39:25 -06:00
  • b43272afa2 Unicode codepoint flags for custom regexs (#7245) b2916 jaime-m-p 2024-05-18 01:09:13 +02:00
  • 0fc1e820a9 CUDA: faster large batch FA without tensor cores (#7314) b2915 Johannes Gäßler 2024-05-17 18:54:52 +02:00
  • 82ca83db3c ROCm: use native CMake HIP support (#5966) b2914 Gavin Zhao 2024-05-17 11:03:03 -04:00
  • f4bd8b3d26 rpc : set SO_REUSEADDR for the server socket (#7320) b2913 Radoslav Gerganov 2024-05-17 17:25:44 +03:00
  • 2117b30380 ggml : disable SIMD exp and silu for 32-bit ARM Georgi Gerganov 2024-05-17 15:47:56 +03:00
  • 8725937362 android : use "ci-android" branch for CI Georgi Gerganov 2024-05-17 15:47:44 +03:00
  • 51e9d02599 Added a single test function script and fix debug-test.sh to be more robust (#7279) Brian 2024-05-17 22:40:14 +10:00
  • d273c1402b py : convert-hf-to-gguf-update improvements (#7340) Aarni Koskela 2024-05-17 15:11:45 +03:00
  • 27b040691c llama : use n_embd_head_v when reshaping kqv (#7327) b2910 fairydreaming 2024-05-17 13:24:38 +02:00
  • 6b2f496409 wip gg/test-embd Georgi Gerganov 2024-05-17 14:00:44 +03:00
  • 29c60d8cdd tokenization: add warning for double BOS (#7332) b2909 Johannes Gäßler 2024-05-17 09:59:57 +02:00