Commit Graph

  • 6efb8eb30e convert.py : fix vanilla LLaMA model conversion (#4818) b1804 Austin 2024-01-09 13:46:46 -05:00
  • 52ea3f7930 iq2_xs: better ARM_NEON dot product Iwan Kawrakow 2024-01-09 19:43:39 +01:00
  • 36e5a08b20 llava-cli : don't crash if --image flag is invalid (#4835) b1803 Justine Tunney 2024-01-09 09:59:14 -08:00
  • 4dccb38d9a metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
  • ff49d876c6 iq2_xs: working, but dog slow, ARM_NEON dot product Iwan Kawrakow 2024-01-09 18:36:45 +01:00
  • 55e2cae83f iq2_xs: Metal now works Iwan Kawrakow 2024-01-09 18:22:20 +01:00
  • 9a818f7c42 scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
  • 0aacd55159 iq2_xs: WIP Metal Iwan Kawrakow 2024-01-09 17:46:27 +01:00
  • 18adb4e9bb readme : add 3rd party collama reference to UI list (#4840) b1800 iohub 2024-01-10 00:45:54 +08:00
  • 9b6e38d8c0 iq2_xs: CUDA and scalar CPU works Iwan Kawrakow 2024-01-09 18:19:02 +02:00
  • 9f21b82e4b iq2_xs: this should have been in the basics Iwan Kawrakow 2024-01-08 20:18:02 +02:00
  • 3569fa3fe3 iq2_xs: basics Iwan Kawrakow 2024-01-08 20:05:00 +02:00
  • d9653894df scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
  • 128de3585b server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
  • 24096933b0 server : try to fix infill when prompt is empty gg/server-infill-empty-prompt-4027 Georgi Gerganov 2024-01-09 11:27:29 +02:00
  • 8c58330318 server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
  • 18c2e1752c ggml : fix vld1q_s8_x4 32-bit compat (#4828) b1796 Georgi Gerganov 2024-01-09 10:42:06 +02:00
  • 7216af5c09 ggml : fix 32-bit ARM compat (cont) gg/fix-vld1q_s8_x4-4872 Georgi Gerganov 2024-01-09 10:33:16 +02:00
  • 8f900abfc0 CUDA: faster softmax via shared memory + fp16 math (#4742) b1795 Johannes Gäßler 2024-01-09 08:58:55 +01:00
  • 27afe29927 ggml : fix vld1q_s8_x4 32-bit compat Georgi Gerganov 2024-01-08 23:45:24 +02:00
  • 3959283eed Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomic-vulkan Jared Van Bortel 2024-01-08 15:57:12 -05:00
  • 8b65f4c5e5 Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomic-vulkan Jared Van Bortel 2024-01-08 15:50:18 -05:00
  • 44b1a97a15 kompute : fix -Wunused-private-field warnings from clang Jared Van Bortel 2023-12-11 13:04:43 -05:00
  • 1fc2f265ff common : fix the short form of --grp-attn-w, not -gat (#4825) b1794 howlger 2024-01-08 20:05:53 +01:00
  • a9a8c5de3d readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • dd5ae06405 SOTA 2-bit quants (#4773) b1792 Kawrakow 2024-01-08 16:02:32 +01:00
  • 668b31fc7d swift : exclude ggml-metal.metal from the package (#4822) b1791 Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 42ea63c5a3 llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 52531fdff8 main : add self-extend support (#4815) b1789 Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • b0034d93ce examples : add passkey test (#3856) b1788 Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • d57cb9c294 passkey : add readme passkey Georgi Gerganov 2024-01-08 11:13:44 +02:00
  • 164d7a0546 passkey : add "self-extend"-like context extension (#4810) Georgi Gerganov 2024-01-08 11:10:32 +02:00
  • a42feb1885 make : add passkey target Georgi Gerganov 2024-01-08 11:09:07 +02:00
  • b7e7982953 readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) b1787 Lars Grammel 2024-01-07 21:24:11 +01:00
  • 226460cc0d llama-bench : add no-kv-offload parameter (#4812) b1786 slaren 2024-01-07 17:59:01 +01:00
  • d5a410e855 CUDA: fixed redundant value dequantization (#4809) b1785 Johannes Gäßler 2024-01-07 17:24:08 +01:00
  • f2c9800dfb passkey : simplify n_past logic Georgi Gerganov 2024-01-07 17:52:12 +02:00
  • bda3f2c892 passkey : select pass key pos from CLI Georgi Gerganov 2024-01-07 14:48:09 +02:00
  • fbb999f592 passkey : better prints Georgi Gerganov 2023-10-30 11:13:44 +02:00
  • 21196da114 examples : add passkey test Georgi Gerganov 2023-10-30 10:44:07 +02:00
  • 9dede37d81 llama : remove unused vars (#4796) b1784 Georgi Gerganov 2024-01-07 14:29:36 +02:00
  • 3c36213df8 llama : remove redundant GQA check (#4796) b1783 Georgi Gerganov 2024-01-07 11:21:53 +02:00
  • 72d8407b36 llama.swiftui : use llama.cpp as SPM package (#4804) b1782 Alex Azarov 2024-01-07 09:20:50 +01:00
  • d117d4dc5d llama : print tensor meta for debugging b1781 Georgi Gerganov 2024-01-07 09:50:31 +02:00
  • 3418c03ecc llama.swiftui : add visionOS target (#4805) Alex Azarov 2024-01-07 08:46:55 +01:00
  • 63ee677efd ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787) b1779 Konstantin Zhuravlyov 2024-01-07 01:52:42 -05:00
  • 67984921a7 server : fix n_predict check (#4798) b1778 Georgi Gerganov 2024-01-07 08:45:26 +02:00
  • c75ca5d96f llama.swiftui : use correct pointer for llama_token_eos (#4797) b1777 Daniel Illescas Romero 2024-01-06 16:12:59 +01:00
  • 7cfde78190 llama : remove redundant GQA check gg/remove-gqa-check-4657 Georgi Gerganov 2024-01-06 16:04:20 +02:00
  • 96e80dabc6 examples : improve base-translate.sh script (#4783) Georgi Gerganov 2024-01-06 11:40:24 +02:00
  • eec22a1c63 cmake : check for openblas64 (#4134) b1775 a-n-n-a-l-e-e 2024-01-05 08:04:40 -08:00
  • be36bb946a flake.nix : fix typo (#4700) Ikko Eltociear Ashimine 2024-01-06 01:02:44 +09:00
  • 91d38876df metal : switch back to default.metallib (ggml/681) b1773 Georgi Gerganov 2024-01-05 16:30:52 +02:00
  • d061bf9405 ggml : fix q2_k bpw in comments (ggml/680) Georgi Gerganov 2024-01-05 15:36:04 +02:00
  • 1bf681f90e ggml : add error handling to graph_compute (whisper/1714) Finn Voorhees 2024-01-03 08:39:43 -05:00
  • c1d7cb28d3 ggml : do not sched_yield when calling BLAS (#4761) b1770 Georgi Gerganov 2024-01-05 15:18:21 +02:00
  • 3681f22443 examples : add few-shot translation example (#4783) Georgi Gerganov 2024-01-05 15:11:10 +02:00
  • b3a7c20b5c finetune : remove unused includes (#4756) b1768 Daniel Bevenius 2024-01-04 20:45:37 +01:00
  • 012cf349ae server : send token probs for "stream == false" (#4714) b1767 Georgi Gerganov 2024-01-04 19:56:33 +02:00
  • a91928014f Print backend name on test-backend-ops failure (#4751) b1766 Johannes Gäßler 2024-01-04 09:43:23 +01:00
  • 3c0b585561 llama.swiftui : support loading custom model from file picker (#4767) b1765 singularity 2024-01-04 16:22:38 +08:00
  • e5804313a1 server : fix options in README.md (#4765) Michael Coppola 2024-01-04 03:17:09 -05:00
  • dc891b7f7a ggml : include stdlib.h before intrin.h (#4736) b1763 Georgi Gerganov 2024-01-04 10:12:26 +02:00
  • 46cea79e1f llama.swiftui : fix build of ggml.metallib (#4754) singularity 2024-01-04 15:58:16 +08:00
  • cb1e2818e0 train : fix typo in overlapping-samples help msg (#4758) b1761 Daniel Bevenius 2024-01-03 18:53:40 +01:00
  • ece9a45e8f swift : update Package.swift to use ggml as dependency (#4691) b1760 Ashraful Islam 2024-01-03 11:30:02 -06:00
  • 7bed7eba35 cuda : simplify expression b1759 Georgi Gerganov 2024-01-03 14:18:46 +02:00
  • d55356d3ba cuda : mark I16 and I32 ops as unsupported Georgi Gerganov 2024-01-03 13:01:44 +02:00
  • 75e3fd8581 sync : ggml Georgi Gerganov 2024-01-03 11:37:44 +02:00
  • 289313716f metal : add kernel_get_rows_i32 Georgi Gerganov 2024-01-03 11:35:46 +02:00
  • ab62fc3e55 scripts : fix sync order + metal sed Georgi Gerganov 2024-01-03 11:25:54 +02:00
  • 5f66ebca9c ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) Guillaume Wenzek 2023-12-29 18:07:03 +01:00
  • f2eb19bd8b server : throw an error when slot unavailable (#4741) Justin Parker 2024-01-03 03:43:19 -05:00
  • f3f62f0d83 metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725) b1752 Georgi Gerganov 2024-01-02 21:07:47 +02:00
  • 9f51f3e695 metal : opt mul_mm_id gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 20:50:18 +02:00
  • 4cc78d3873 ggml : force F32 precision for ggml_mul_mat cuda-cublas-opts Georgi Gerganov 2023-12-19 16:23:39 +02:00
  • 0ef3ca2ac6 server : add token counts to html footer (#4738) b1751 Phil H 2024-01-02 15:48:49 +00:00
  • 21e100d6dc Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:27:21 +02:00
  • 540938f890 llama : llama_model_desc print number of experts b1750 Georgi Gerganov 2024-01-02 16:26:45 +02:00
  • daf9b12472 metal : minor fix Georgi Gerganov 2024-01-02 16:25:41 +02:00
  • 74460d0065 Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:24:05 +02:00
  • c73e598d1c Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:22:47 +02:00
  • 0040d42eeb llama : replace all API facing int's with int32_t (#4577) b1749 Marcus Dunn 2024-01-02 06:15:16 -08:00
  • b5af7ad84f llama : refactor quantization to avoid <mutex> header gg/avoid-mutex Georgi Gerganov 2024-01-02 15:53:28 +02:00
  • 83e633c27e llama : differentiate the KV dims in the attention (#4657) b1748 postmasters 2024-01-02 03:51:28 -08:00
  • 120a1a5515 llama : auto download HF models if URL provided gg/hf-auto-dl Georgi Gerganov 2024-01-02 13:19:56 +02:00
  • 32866c5edd editorconfig : fix whitespace and indentation #4710 b1747 Georgi Gerganov 2024-01-02 13:28:15 +02:00
  • 5d7002d437 server : add --override-kv parameter (#4710) b1746 minarchist 2024-01-02 04:38:15 -06:00
  • 26f3071d71 py : re-enable mmap in convert hf (#4732) Nam D. Tran 2024-01-02 16:23:38 +07:00
  • 775ac8712a finetune: fix typo in README.md (#4733) Daniel Bevenius 2024-01-02 10:16:55 +01:00
  • 58ba655af0 metal : enable shader debugging (cmake option) (#4705) b1743 Georgi Gerganov 2024-01-02 10:57:44 +02:00
  • 76f9d41dd6 metal : optimizing ggml_mul_mat_id (wip) Georgi Gerganov 2023-12-31 18:14:02 +02:00
  • edd1ab7bc3 flake.lock: update b1742 Someone Serge 2023-12-31 17:42:22 +00:00
  • 198ed7ebfc flake.nix: suggest the binary caches Someone Serge 2023-12-30 18:25:25 +00:00
  • d836174731 workflows: nix-ci: add a qemu job for jetsons Someone Serge 2023-12-30 18:01:07 +00:00
  • 06f2a5d190 workflows: nix-flakestry: drop tag filters Someone Serge 2023-12-30 17:36:08 +00:00
  • c5239944ba workflows: weekly nix flake update Someone Serge 2023-12-30 16:38:36 +00:00
  • 1e9ae54cf2 workflows: nix-ci: add a job for eval Someone Serge 2023-12-30 17:19:11 +00:00
  • 7adedecbe3 workflows: nix-ci: init; build flake outputs Someone Serge 2023-12-26 19:17:26 +00:00
  • 356ea17e0f flake.nix: expose checks Someone Serge 2023-12-29 16:21:50 +00:00