Commit Graph

  • d2cfc2225f Moved regex patterns to unicode.cpp and updated unicode.h Kazim Abrar Mahi 2024-03-23 01:13:08 +06:00
  • 6fbab2dbc8 merged the changes from deepseeker models to main branch Jaggzh 2024-02-12 04:04:34 -08:00
  • 83b72cb086 Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
  • d4a9afc100 ci: server: fix python installation (#6918) b2740 Pierrick Hymbert 2024-04-26 09:27:49 +02:00
  • 7d641c26ac ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
  • 5790c8dac1 bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
  • 46e12c4692 llava : add support for moondream vision language model (#6899) b2737 vik 2024-04-25 12:38:31 -07:00
  • dba497e0c1 cmake : restore LLAMA_LLAMAFILE_DEFAULT b2736 Georgi Gerganov 2024-04-25 21:31:17 +03:00
  • 9e3876061c llama : add static reminder for llama_state_get_size Georgi Gerganov 2024-04-25 20:33:36 +03:00
  • 4f4c0249bf metal : remove tmp log Georgi Gerganov 2024-04-25 20:29:25 +03:00
  • 1e590ac3c9 llama : update llama_state_get_size after v_trans field Georgi Gerganov 2024-04-25 20:06:23 +03:00
  • 0fc5c5eb74 llama : disallow incompatible states Georgi Gerganov 2024-04-25 19:53:57 +03:00
  • bab346ba69 llama : fix copy-paste errors, add TODO Georgi Gerganov 2024-04-25 19:45:36 +03:00
  • c225609f10 llama : llama_kv_cache_clear zeroes data + fix save-load seq Georgi Gerganov 2024-04-25 19:37:27 +03:00
  • ac1c6d91de ci : add CUDA save-load-state tests Georgi Gerganov 2024-04-25 19:03:59 +03:00
  • 09d0381c58 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-25 19:01:52 +03:00
  • fa0b4ad252 cmake : remove obsolete ANDROID check b2735 Georgi Gerganov 2024-04-25 18:59:51 +03:00
  • d6e1d44f16 llama : synchronize before get/set session data (#6911) b2734 slaren 2024-04-25 17:59:03 +02:00
  • 1fd5bc3d5e llama : support save/load state with FA enabled Georgi Gerganov 2024-04-25 18:18:13 +03:00
  • cb3547ac46 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-25 17:06:56 +03:00
  • 853d06ffe2 ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
  • 3fe0596c18 readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
  • 0ead1f1072 llama : check that all the tensor data is in the model file (#6885) b2731 slaren 2024-04-25 15:23:47 +02:00
  • ff2c64a9f4 tests : remove TMP_ATTN_BENCH Georgi Gerganov 2024-04-25 15:51:46 +03:00
  • 1f77f49787 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-25 15:50:36 +03:00
  • 51543729ff ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) b2730 Georgi Gerganov 2024-04-25 15:48:25 +03:00
  • 4ab99d8d47 clip : rename lerp function to avoid conflict (#6894) b2729 Daniel Bevenius 2024-04-25 14:38:14 +02:00
  • 54770413c4 ggml : fix MIN / MAX macros (#6904) b2728 Georgi Gerganov 2024-04-25 15:12:28 +03:00
  • 8c259f6f3e ggml : fix MIN / MAX macros gg/fix-min-max Georgi Gerganov 2024-04-25 14:28:41 +03:00
  • aa750c1ede tests : minor bash stuff (#6902) b2727 Georgi Gerganov 2024-04-25 14:27:20 +03:00
  • 1966eb2615 quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
  • 784e11dea1 README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
  • ce281b904c llama : disable FA for AMD Georgi Gerganov 2024-04-24 16:48:10 +03:00
  • b4e4b8a935 llama : add llama_get_pooling_type function (#6862) b2724 Douglas Hanley 2024-04-24 08:10:07 -05:00
  • 8937ec5307 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-24 14:00:32 +03:00
  • 3fe847b574 server : do not apply Markdown formatting in code sections (#6850) mgroeber9110 2024-04-24 12:54:24 +02:00
  • 37246b1031 common : revert showing control tokens by default for server (#6860) Kyle Mistele 2024-04-24 05:15:29 -05:00
  • 28103f4832 Server: fix seed for multiple slots (#6835) Johannes Gäßler 2024-04-24 11:08:36 +02:00
  • c0d1b3e03e ggml : move 32-bit arm compat in ggml-impl.h (#6865) Georgi Gerganov 2024-04-24 12:00:07 +03:00
  • abd3314064 llama : add phi 3 chat template (#6857) Tristan Druyen 2024-04-24 10:52:37 +02:00
  • 3fec68be4e convert : add support of codeqwen due to tokenizer (#6707) Junyang Lin 2024-04-24 15:16:21 +08:00
  • c8297c6af5 llama : add phi3 support (#6852) b2717 liuwei-git 2024-04-24 15:00:37 +08:00
  • 5dcccb3a7d convert : fix tokenizer conversion gg/add-phi-3-support Georgi Gerganov 2024-04-23 22:11:09 +03:00
  • 1732737232 convert : add phi-3 support Georgi Gerganov 2024-04-23 20:38:51 +03:00
  • 751591d520 server : add help for --flash-attn arg Georgi Gerganov 2024-04-23 18:16:25 +03:00
  • d228bf8552 cont Georgi Gerganov 2024-04-23 17:32:11 +03:00
  • 56657e52e5 llama : fix n_batch requirements Georgi Gerganov 2024-04-23 17:30:37 +03:00
  • 19e8982f51 llama : prep ALiBi support for BERT models Georgi Gerganov 2024-04-23 17:24:28 +03:00
  • 78d363b0d4 llama : replace bool need_kq_pos with use_alibi Georgi Gerganov 2024-04-23 17:15:13 +03:00
  • 3864eea4cb ggml : add TODO's for F16/F32 mask/pos support in other backends Georgi Gerganov 2024-04-23 10:01:49 +03:00
  • c129369702 cuda : try to fix __hgt2_mask Georgi Gerganov 2024-04-22 21:42:43 +03:00
  • 4e96a812b3 [SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activated (#6767) b2716 Anas Ahouzi 2024-04-23 02:53:18 +02:00
  • 192090bae4 llamafile : improve sgemm.cpp (#6796) b2715 Justine Tunney 2024-04-22 15:00:36 -04:00
  • c70bfd7bcb cuda : "constexpr dim3" -> "const dim3" Georgi Gerganov 2024-04-22 20:31:23 +03:00
  • 5408d55506 cuda : uint -> uint32_t Georgi Gerganov 2024-04-22 19:12:06 +03:00
  • e931888d50 ggml : fix calloc argument ordering. (#6820) b2714 Dave Airlie 2024-04-23 00:05:06 +10:00
  • 8960fe86ae llama : fix typo in <|im_end|> token text (#6745) Georgi Gerganov 2024-04-22 15:41:11 +03:00
  • f725ca90fb ggml : ggml_soft_max support F16/F32 mask/pos Georgi Gerganov 2024-04-22 13:46:23 +03:00
  • c0956b09ba ci: fix job are cancelling each other (#6781) b2712 Pierrick Hymbert 2024-04-22 13:22:54 +02:00
  • e9b4a1bf68 flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
  • c11d05fec0 llama : force disable flash attention for incompatible models Georgi Gerganov 2024-04-22 12:50:41 +03:00
  • cb76d747d1 ggml : fix num dimensions in ggml_flash_attn_ext Georgi Gerganov 2024-04-22 12:50:26 +03:00
  • a39217d428 common : print --flash-attn in help Georgi Gerganov 2024-04-22 12:50:10 +03:00
  • 124e4dced2 Update test-bench Aidan 2024-04-22 10:42:32 +01:00
  • 5cf5e7d490 build: generate hex dump of server assets during build (#6661) b2710 Olivier Chafik 2024-04-21 18:48:53 +01:00
  • 40f74e4d73 llama : add option to render special/control tokens (#6807) b2709 Georgi Gerganov 2024-04-21 18:36:45 +03:00
  • b9cc76d87e ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) b2708 Georgi Gerganov 2024-04-21 16:47:57 +03:00
  • 7dbdba5690 llama : add llama-3 chat template (#6751) b2707 Wouter 2024-04-21 15:03:39 +02:00
  • c1386c936e gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761) pmysl 2024-04-21 14:49:30 +02:00
  • e8d35f47cb doc : add link to falcon (#6789) Jan Boon 2024-04-21 20:35:40 +08:00
  • 2cca09d509 readme : add Fedora instructions (#6783) Mohammadreza Hendiani 2024-04-21 16:02:05 +03:30
  • 89b0bf0d5d llava : use logger in llava-cli (#6797) Justine Tunney 2024-04-21 08:19:04 -04:00
  • b97bc3966e llama : support Llama 3 HF conversion (#6745) b2702 Pedro Cuenca 2024-04-21 13:50:41 +02:00
  • b8109bc013 doc : server tests require llama to be built with curl enabled (#6788) b2701 Jan Boon 2024-04-21 00:29:50 +08:00
  • 3750706962 llama : add llama_token_is_eog() gg/llama3-support Georgi Gerganov 2024-04-20 16:46:46 +03:00
  • aed82f6837 common : try to fix Android CI (#6780) b2700 Georgi Gerganov 2024-04-20 13:27:12 +03:00
  • f3105b9eec Accept suggestion Pedro Cuenca 2024-04-19 22:12:20 +02:00
  • 0e4802b2ec ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748) b2699 loonerin 2024-04-19 13:03:35 -04:00
  • 871fcb6e10 ggml : fix soft_max with bias on CPU Georgi Gerganov 2024-04-19 18:03:56 +03:00
  • 3badef1fe1 ggml : fix avx512 const correctness Georgi Gerganov 2024-04-19 17:45:08 +03:00
  • 52945429eb tests : remove benchmarks Georgi Gerganov 2024-04-19 17:38:28 +03:00
  • 29f6ad8d95 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-19 17:30:09 +03:00
  • bc346166f9 metal : minor Georgi Gerganov 2024-04-19 17:24:52 +03:00
  • 1a88565b44 metal : clean-up kernel code Georgi Gerganov 2024-04-19 15:52:49 +03:00
  • 97eaece7d6 metal : clean-up Georgi Gerganov 2024-04-19 15:30:27 +03:00
  • 703c6e6528 ggml : fix arm fp16 store on windows Georgi Gerganov 2024-04-19 14:20:41 +03:00
  • 637e9a86c2 server: static: upstream upgrade (#6765) b2698 Pierrick Hymbert 2024-04-19 13:19:01 +02:00
  • e32b281743 llama : adapt build_olmo to changes Georgi Gerganov 2024-04-19 14:04:56 +03:00
  • 1db66c1dac Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-19 14:03:55 +03:00
  • 74d57f9513 llama : simplify llama_build_kv_store Georgi Gerganov 2024-04-19 13:49:57 +03:00
  • 9958c81b79 Implement the OLMo architecture (#6741) b2697 nopperl 2024-04-19 09:35:54 +00:00
  • 8b1b1f4982 train : add general name (#6752) b2696 Austin 2024-04-19 03:16:45 -04:00
  • bca40e9814 fix wrong parameter in cmd in readme-sycl.md (#6755) Neo Zhang 2024-04-19 09:16:31 +08:00
  • 9ca869876e batched-bench : add fattn arg Georgi Gerganov 2024-04-18 21:41:32 +03:00
  • c16a7c2688 metal : use F32 attention accumulators Georgi Gerganov 2024-04-18 20:08:52 +03:00
  • 112c4c4e9b style Pedro Cuenca 2024-04-18 18:46:03 +02:00
  • d79ab101c3 Support Llama 3 conversion Pedro Cuenca 2024-04-18 18:38:05 +02:00
  • 0d56246f4b ggml : group all experts in a single ggml_mul_mat_id (#6505) b2694 slaren 2024-04-18 15:18:48 +02:00
  • 03c0946d73 convert : support models with multiple chat templates (#6588) Sigbjørn Skjæret 2024-04-18 13:49:01 +02:00
  • fa9e8c6689 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-18 14:39:23 +03:00