Commit Graph

  • a4b6341c7b wip : template for rows per warp Georgi Gerganov 2024-01-21 18:24:13 +02:00
  • 05490fad7f add safetensors support to convert-lora-to-ggml.py (#5062) kuronekosaiko 2024-01-22 00:28:14 +08:00
  • f31955f5d1 wip : 4 rows per simd group Georgi Gerganov 2024-01-21 18:01:28 +02:00
  • 8cde449b8b wip : 8 rows per simd group Georgi Gerganov 2024-01-21 12:23:22 +02:00
  • 6c5629d4d2 add #include <string> to unicode.h (#5051) bobqianic 2024-01-21 15:17:35 +00:00
  • 7dcbe39d36 Add ability to evauate multiple choice tasks (#5047) Kawrakow 2024-01-21 14:42:44 +02:00
  • b97325800a metal : specialize for head size Georgi Gerganov 2024-01-21 12:01:55 +02:00
  • 52ae085750 metal : reduce branches Georgi Gerganov 2024-01-21 11:38:17 +02:00
  • 528da7515e metal : f16 precision Georgi Gerganov 2024-01-21 11:13:24 +02:00
  • 1173f49c3b metal : initial implementation Georgi Gerganov 2024-01-20 17:32:28 +02:00
  • 726c0fa9a2 Slightly faster imatrix (#5050) Kawrakow 2024-01-21 08:01:20 +02:00
  • 942c0107a7 flake.lock: Update (#5054) Georgi Gerganov 2024-01-21 05:17:27 +02:00
  • b43ebde3b0 convert : partially revert PR #4818 (#5041) Jared Van Bortel 2024-01-20 18:14:18 -05:00
  • 97c1549808 perplexity : fix MSVC build after #5020 (#5043) Jared Van Bortel 2024-01-20 10:08:08 -05:00
  • 6df465a91d llama : run all KQV ops on the CPU with no KV offload (#5049) slaren 2024-01-20 16:05:49 +01:00
  • a9681febd6 ggml : online attention (CPU) gg/flash-attn-online Georgi Gerganov 2024-01-20 12:26:49 +02:00
  • c3cdfffa88 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-01-20 10:12:07 +02:00
  • 77bc1bbd05 cmake : add support for ccache (#5002) Herman Semenov 2024-01-20 08:11:31 +00:00
  • 48e2b13372 Add a dart/flutter binding to README.md (#4882) adel boussaken 2024-01-20 09:05:43 +01:00
  • cca894f16a cuda : fix compile error in jetson platform (#4975) Kylin 2024-01-20 15:01:46 +08:00
  • fded2e6a11 apply suggestions FSSRepo 2024-01-19 20:18:18 -05:00
  • 09db1a7cf3 Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp into flash-attn-cuda FSSRepo 2024-01-19 17:38:47 -05:00
  • 32a392fe68 try a differerent fix ceb/fix-msvc-build Jared Van Bortel 2024-01-19 17:10:23 -05:00
  • e15c61635f perplexity : fix MSVC build after #5020 Jared Van Bortel 2024-01-19 16:38:29 -05:00
  • 4a3bc1522e py : linting with mypy and isort ceb/restore-convert Jared Van Bortel 2024-01-19 12:38:18 -05:00
  • ffdd051ab5 convert : update GGML script to use VocabFactory Jared Van Bortel 2024-01-19 12:27:58 -05:00
  • cb4605fe47 convert : partially revert PR #4818 Jared Van Bortel 2024-01-19 12:19:25 -05:00
  • 381ee19572 finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033) Uzo Nweke 2024-01-19 13:20:50 -05:00
  • fa7ebcca99 ggml : fix GQA support in ggml_flash_attn_ext Georgi Gerganov 2024-01-19 20:06:26 +02:00
  • a5cacb22b2 imatrix : add README.md Georgi Gerganov 2024-01-19 15:24:47 +02:00
  • 9b75cb2b3c llama : support upcoming Qwen2 (#5037) Shijie 2024-01-19 19:53:13 +08:00
  • de9a147df1 py : fix flake8 lint Georgi Gerganov 2024-01-19 13:52:22 +02:00
  • 7051aacfac winogrande: evaluate log-probs in parallel (#5036) Kawrakow 2024-01-19 11:39:11 +02:00
  • 2b3b999cac llama : add CodeShell support (#5016) chiranko 2024-01-19 17:07:27 +08:00
  • 993fba8180 perplexity: avoid unnecessary alloocations and logit copies (#5035) Kawrakow 2024-01-19 11:02:39 +02:00
  • 8b20858e5e perplexity : faster Winogrande via batching (#5024) Georgi Gerganov 2024-01-19 10:45:06 +02:00
  • 57e2a7a52a llama : fix falcon arch for tied output embeddings (#4978) John 2024-01-18 23:12:15 +01:00
  • 1453215165 kompute : fix ggml_add kernel ceb/nomic-vulkan-fix-add Georgi Gerganov 2024-01-19 00:09:16 +02:00
  • 610394fff8 fix supported ops for kompute backend Jared Van Bortel 2024-01-18 15:32:55 -05:00
  • 9b6ea4263a cmake : add ggml public headers (#5011) Georgi Gerganov 2024-01-18 23:36:07 +02:00
  • 7addf2b878 never try to evaluate an empty command buffer Jared Van Bortel 2024-01-18 16:11:00 -05:00
  • 821f0a271e server : defer tasks when "slot unavailable" (#5018) Xuan Son Nguyen 2024-01-18 21:33:05 +01:00
  • 96d7f56d29 llama : fix mlock with no-mmap with Metal (#5025) slaren 2024-01-18 21:12:15 +01:00
  • 2d5419d08a imatrix : fix assert for src0 non-cont check Georgi Gerganov 2024-01-18 21:45:51 +02:00
  • d391ae9b49 perplexity : fix winogrande N tasks option Georgi Gerganov 2024-01-18 20:49:00 +02:00
  • e9240cdfa0 scripts : add get-winogrande.sh Georgi Gerganov 2024-01-18 20:45:39 +02:00
  • b46757735d convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019) David Sommers 2024-01-18 12:20:59 -05:00
  • 3e945cc1e9 HellaSwag: speed up by parallelizing log-prob evaluation (#5020) Kawrakow 2024-01-18 19:18:21 +02:00
  • 16bc3c3be8 sync op_rope_f16 with recent op_rope_f32 changes Jared Van Bortel 2024-01-18 11:56:00 -05:00
  • a1c004ef2e ggml : add ggml_flash_attn_ext API Georgi Gerganov 2024-01-18 17:42:55 +02:00
  • 0f1a958a51 actually fix this assertion Jared Van Bortel 2024-01-18 11:48:27 -05:00
  • a97935e098 clean up old backend code Jared Van Bortel 2024-01-18 11:48:12 -05:00
  • 696faa8660 kompute : fix rope_f32 and scale ops (#5008) Georgi Gerganov 2024-01-18 18:49:39 +02:00
  • e53de2866a fix compilation FSSRepo 2024-01-18 11:27:07 -05:00
  • ccc78a200e hellaswag: speed up even more by parallelizing log-prob evaluation ik/faster_hellaswag Iwan Kawrakow 2024-01-18 18:25:29 +02:00
  • ad19812cda perplexity : faster HellaSwag via batching (#5017) Georgi Gerganov 2024-01-18 15:33:01 +02:00
  • 682986a08e Add Winogrande evaluation (#5015) Kawrakow 2024-01-18 13:46:27 +02:00
  • dcad445d0c scritps : add helper script to get hellaswag data in txt format Georgi Gerganov 2024-01-18 11:44:49 +02:00
  • 1e605f4102 metal : fix memory leak, dangling pointer and unused autorel (#5007) Paul Tsochantaris 2024-01-18 08:47:24 +00:00
  • f7bcfb0566 cuda: add flash attention + test FSSRepo 2024-01-17 16:38:28 -05:00
  • 02b9bafe29 kompute : ignore exceptions in ggml_vk_available_devices (#12) Jared Van Bortel 2024-01-17 13:47:03 -05:00
  • 6b6916b215 sync : ggml Georgi Gerganov 2024-01-17 20:54:50 +02:00
  • 38566680cd ggml : add IQ2 to test-backend-ops + refactoring (#4990) Georgi Gerganov 2024-01-17 18:54:56 +02:00
  • ba69bbc84c imatrix : offload to GPU support (#4957) Georgi Gerganov 2024-01-17 18:46:30 +02:00
  • 2917e6b528 Merge branch 'master' into gg/imatrix-gpu-4931 gg/imatrix-gpu-4931 Georgi Gerganov 2024-01-17 18:41:47 +02:00
  • 44a1a4a41a backend : add eval callback (#4935) Georgi Gerganov 2024-01-17 18:39:41 +02:00
  • c918fe8dca metal : create autorelease pool during library build (#4970) Georgi Gerganov 2024-01-17 18:38:39 +02:00
  • 0f83e727af py : fix whitespace Georgi Gerganov 2024-01-17 18:37:36 +02:00
  • de9b0bbbe4 add sanity check and fix kompute teardown order Jared Van Bortel 2024-01-17 10:09:27 -05:00
  • 4f4bf35f46 py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971) Georgi Gerganov 2024-01-17 15:45:03 +02:00
  • 23742deb5b py : fix padded dummy tokens (I hope) gg/fix-spm-added-tokens-dict-4958 Georgi Gerganov 2024-01-17 15:44:22 +02:00
  • 4fb52843bb ci : rearrange output Georgi Gerganov 2024-01-17 15:27:34 +02:00
  • 10b25e0388 ci : add imatrix test Georgi Gerganov 2024-01-17 15:10:38 +02:00
  • a722d05a87 imatrix : fix ggml_mul_mat_id hanlding Georgi Gerganov 2024-01-17 14:43:35 +02:00
  • 2b3a665d39 llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996) Kawrakow 2024-01-17 12:36:37 +02:00
  • 9fd1e83f6d Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 ik/better_q2_k_s Iwan Kawrakow 2024-01-17 12:14:19 +02:00
  • 49bafe0986 tests : avoid creating RNGs for each tensor gg/iq2-refactor-and-tests Georgi Gerganov 2024-01-17 10:40:55 +02:00
  • 7563293665 metal : remove unnecessary nil check (#4986) Paul Tsochantaris 2024-01-17 08:07:24 +00:00
  • f46c0c1b0e llama : fix copy/paste error in llama_sampling_params comment (#4994) David Renshaw 2024-01-17 02:17:50 -05:00
  • 8eb8fd94e2 tests : avoid creating RNGs for each Q tensor Georgi Gerganov 2024-01-16 23:24:05 +02:00
  • b7ddc8bf12 cuda : fix out-of-bounds-access in mul_mat_vec_q Georgi Gerganov 2024-01-16 23:06:18 +02:00
  • 36feaeb401 ci : enable LLAMA_CUBLAS=1 for CUDA nodes Georgi Gerganov 2024-01-16 22:32:22 +02:00
  • e9a5d54b7d cuda : update supports_op for IQ2 Georgi Gerganov 2024-01-16 22:13:17 +02:00
  • bc0bb3009c ggml : add IQ2 to test-backend-ops + refactoring Georgi Gerganov 2024-01-14 13:15:30 +02:00
  • 5c99960901 py : remove unnecessary hasattr (#4903) Georgi Gerganov 2024-01-16 20:59:31 +02:00
  • bee938da74 nix: remove nixConfig from flake.nix (#4984) b1893 Philip Taron 2024-01-16 09:56:21 -08:00
  • cec8a48470 finetune : add training data file to log message (#4979) b1892 Daniel Bevenius 2024-01-16 18:54:24 +01:00
  • 334a835a1c ggml : importance matrix support for legacy quants (#4969) b1891 Kawrakow 2024-01-16 19:51:26 +02:00
  • 4feb4b33ee examples : add complete parallel function calling example (#4974) Maximilian Winter 2024-01-16 18:41:42 +01:00
  • 959ef0c0df perplexity : fix kv cache handling for hellaswag (#4981) b1889 Georgi Gerganov 2024-01-16 19:34:54 +02:00
  • c37b3474e6 flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920) Georgi Gerganov 2024-01-16 19:13:54 +02:00
  • 158f8c9e21 metal : localized logic in ggml_metal_graph_compute (#4924) b1887 Paul Tsochantaris 2024-01-16 17:05:19 +00:00
  • 862f5e41ab android : introduce starter project example (#4926) b1886 Neuman Vong 2024-01-17 00:47:34 +11:00
  • 3a48d558a6 metal : replace loop of dispatch_async with dispatch_apply (#4934) b1885 Alex Azarov 2024-01-16 14:41:27 +01:00
  • 7c8d3abd1a metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) b1884 Alex Azarov 2024-01-16 14:33:02 +01:00
  • d92351e23d py : fix BPE vocab conversion Georgi Gerganov 2024-01-16 14:47:07 +02:00
  • 122ed4840c examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
  • a1372737e0 py : pad with unknown tokens when data is missing Georgi Gerganov 2024-01-16 14:03:57 +02:00
  • 9b464b4e81 py : fix missing added_tokens_dict for SPM vocab Georgi Gerganov 2024-01-16 13:38:54 +02:00
  • a0b3ac8c48 ggml : introduce GGML_CALL function annotation (#4850) b1882 Justine Tunney 2024-01-16 03:16:33 -08:00