Commit Graph

  • 3c8d6b160b Update ggml-cuda.cu Georgi Gerganov 2023-12-18 14:21:22 +02:00
  • 18c67bdd84 ggml : add ggml_mul_mat_set_prec Georgi Gerganov 2023-12-18 13:28:10 +02:00
  • a8d2a6f3ef Merge branch 'master' into HEAD Georgi Gerganov 2023-12-18 10:17:55 +02:00
  • 9339ffc96d update README okada 2023-12-18 16:46:51 +09:00
  • 907b92185c remove develop code okada 2023-12-18 16:32:16 +09:00
  • febc63598b update kqv code okada 2023-12-18 00:16:56 +09:00
  • ca8f698638 seems ok okada 2023-12-17 23:28:29 +09:00
  • f76fd39266 use inp_pos okada 2023-12-17 21:53:04 +09:00
  • 86d5348fd0 runnable okada 2023-12-17 18:29:08 +09:00
  • a22040a810 fix norm_rms_eps hparam okada 2023-12-17 18:15:25 +09:00
  • 4a3ef4f2a4 able to compile okada 2023-12-17 17:44:29 +09:00
  • 9d49236570 update norm okada 2023-12-17 15:44:59 +09:00
  • b2330f57e2 plamo convert okada 2023-12-17 15:23:59 +09:00
  • 4c585b4c6c add tensor loading okada 2023-12-16 16:24:54 +09:00
  • feb0966af1 add plamo mock okada 2023-12-16 15:55:58 +09:00
  • 2994f0c5a2 decode : fix logits_valid for legacy API (#4516) b1656 Jared Van Bortel 2023-12-17 19:39:02 -05:00
  • 1b05817112 decode : fix logits_valid for old API ceb/fix-logit-check Jared Van Bortel 2023-12-17 18:49:21 -05:00
  • b1306c4394 readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
  • 800a489e4a llama.swiftui : add bench functionality (#4483) b1654 Georgi Gerganov 2023-12-17 19:38:41 +02:00
  • 865066621b llama.swiftui : improve bench gg/swiftui-bench Georgi Gerganov 2023-12-17 19:37:22 +02:00
  • 5c5bdba605 llama : remove "mostly" from model infos Georgi Gerganov 2023-12-17 19:36:53 +02:00
  • f7f468a97d gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
  • f86b9d152c lookup : minor pr/4484 Georgi Gerganov 2023-12-17 17:25:28 +02:00
  • 919c40660f build : Check the ROCm installation location (#4485) b1652 Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
  • 45668633fd finetune : keep allocs alive until all allocations are done (#4486) b1651 slaren 2023-12-17 16:05:56 +01:00
  • 0ffc92d2d2 server : disable llm logs if SERVER_VERBOSE is off (#3792) b1650 olexiyb 2023-12-17 17:02:16 +02:00
  • 8edd2b40fd server : fix grammar being ignored (#4494) b1649 AdithyanI 2023-12-17 15:57:56 +01:00
  • eb16dae7e7 server : fix possible ambiguity in content type charset (#4501) b1648 Alexey Parfenov 2023-12-17 14:56:09 +00:00
  • 62bd52b7bf server : allow requests larger than 8K (#4500) b1647 mzcu 2023-12-17 15:54:37 +01:00
  • 5b27975479 lookup : fix token positions in the draft batch Georgi Gerganov 2023-12-17 16:47:26 +02:00
  • 1b26d7151a Added colors to distinguish drafted tokens (--color). Updated README Leon Ericsson 2023-12-17 13:04:46 +01:00
  • 262fd466f3 llama.swiftui : remove model from project Georgi Gerganov 2023-12-17 13:49:44 +02:00
  • 5daa5f54fd Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) b1646 Bach Le 2023-12-17 18:57:33 +08:00
  • 4ed98b90bc llama.swiftui : avoid data copy via "downloadTask" Georgi Gerganov 2023-12-17 12:19:52 +02:00
  • 9629448716 llama.swiftui : UX improvements Georgi Gerganov 2023-12-17 11:46:13 +02:00
  • d36ca171b6 gitignore : xcode stuff Georgi Gerganov 2023-12-17 10:49:05 +02:00
  • 42e9525884 cuda : less diff in the rope_neox kernel Georgi Gerganov 2023-12-17 09:14:29 +02:00
  • d2f1e0dacc Merge branch 'cuda-cublas-opts' into gg/phi-2 gg/phi-2 Georgi Gerganov 2023-12-17 08:41:46 +02:00
  • f703ca8a3c ggml : fix NeoX rope to rotate just first n_dims Georgi Gerganov 2023-12-17 08:39:18 +02:00
  • b672c169ca ggml : fix NeoX rope to rotate just first n_dims Georgi Gerganov 2023-12-17 08:39:18 +02:00
  • e75889a9b8 Merge branch 'master' into cuda-cublas-opts Georgi Gerganov 2023-12-17 08:20:02 +02:00
  • da44d45265 comment #Preview & fix editorconfig check jhen 2023-12-17 11:37:55 +08:00
  • a520e87ed6 update project.pbxproj jhen 2023-12-17 11:31:44 +08:00
  • ce1df8124a add download buttons & expose llamaState.loadModel jhen 2023-12-17 11:09:51 +08:00
  • ff87313db8 force to use n_gpu_layers on simulator jhen 2023-12-17 11:08:17 +08:00
  • c6c4fc081c lora : add support for non-llama models (#3333) b1645 slaren 2023-12-16 18:58:46 +01:00
  • 0644c3be51 phi-2 : scale Q instead of KQ for better precision Georgi Gerganov 2023-12-16 18:01:08 +02:00
  • 0b6ffa580c convert : revert "added_tokens_decoder" change Georgi Gerganov 2023-12-16 16:05:35 +02:00
  • 45b8032b9c Merge branch 'prompt-lookup' of github.com:LeonEricsson/llama.cpp into prompt-lookup Leon Ericsson 2023-12-16 12:13:50 +01:00
  • 21431197a1 kv_cache management Leon Ericsson 2023-12-16 12:12:33 +01:00
  • a878be4cb1 convert : phi don't add BOS token Georgi Gerganov 2023-12-16 11:20:11 +02:00
  • 5469d82d5a llama : fix meta KV override bug Georgi Gerganov 2023-12-16 11:19:56 +02:00
  • 7500fa2f07 py : whitespaces Georgi Gerganov 2023-12-16 11:01:02 +02:00
  • aa5c881adb phi-2 : use layer norm eps Georgi Gerganov 2023-12-16 10:54:10 +02:00
  • a2a3d2c8d7 phi-2 : various fixes Georgi Gerganov 2023-12-16 10:46:18 +02:00
  • 8a5be3bd58 llama : sanity checks for access to logits (#4274) b1644 Jared Van Bortel 2023-12-15 22:16:15 -05:00
  • e20765534d fix breaking change Ebey Abraham 2023-12-16 00:41:06 +00:00
  • b0547d2196 gguf-py : fail fast on nonsensical special token IDs ceb/fix-badspecial-silentfail Jared Van Bortel 2023-12-15 18:06:42 -05:00
  • 8072706210 kompute : always destroy Manager via the destructor Jared Van Bortel 2023-12-15 16:23:24 -05:00
  • 2d2c76acc4 vulkan : fix free of stack addr in llama_buffer Jared Van Bortel 2023-11-29 18:17:57 -05:00
  • 12cc80cb89 phi2 implementation Ebey Abraham 2023-12-15 20:56:57 +00:00
  • f58f581ca8 refactor llama.cpp modifications Jared Van Bortel 2023-12-15 13:38:54 -05:00
  • 6a8680204c llama.swiftui : initial bench functionality Georgi Gerganov 2023-12-15 16:39:16 +02:00
  • 340484161f Merge branch 'ggerganov:master' into prompt-lookup LeonEricsson 2023-12-15 14:15:04 +01:00
  • 1665ad8bf1 BUG: generates gibberish/repeating tokens after a while Leon Ericsson 2023-12-15 14:14:17 +01:00
  • 88ae8952b6 server : add optional API Key Authentication example (#4441) b1643 ShadovvBeast 2023-12-15 13:49:01 +02:00
  • ee4725a686 ggml : group mul_mat_id rows by matrix (cpu only) (#4480) b1642 slaren 2023-12-15 12:45:50 +01:00
  • afd336f7a6 llama.swiftui : add bench button Georgi Gerganov 2023-12-15 12:38:30 +02:00
  • 6744dbe924 ggml : use ggml_row_size where possible (#4472) b1641 slaren 2023-12-14 20:05:21 +01:00
  • c8fd4ba846 ggml : restore 'static' specifiers Jared Van Bortel 2023-12-14 13:18:14 -05:00
  • cafcd4f895 ggml : remove n_dims from ggml_tensor (#4469) b1640 slaren 2023-12-14 16:52:08 +01:00
  • c50e400163 py : add protobuf dependency (#4466) wonjun Jang 2023-12-14 21:44:49 +09:00
  • 20a68a7030 ggml : add ggml_row_size() (fixes llama out of space) (#4461) b1638 LostRuins 2023-12-14 20:13:33 +08:00
  • 55e87c3749 ggml : fix OpenCL broadcast requirement for ggml_mul (close #4453) b1637 Georgi Gerganov 2023-12-14 10:35:29 +02:00
  • 873637afc7 convert : support loading vocab from fast tokenizer config (#3633) wonjun Jang 2023-12-14 17:09:34 +09:00
  • 0353a18401 readme : update supported model list (#4457) BarfingLemurs 2023-12-14 02:38:49 -05:00
  • f7cb0a65ef remove script with unclear purpose Jared Van Bortel 2023-12-13 17:55:41 -05:00
  • 9af7f58b7b move kompute to a submodule Jared Van Bortel 2023-12-13 17:54:35 -05:00
  • b906e126ca kompute : fix compile warnings Jared Van Bortel 2023-12-13 17:30:38 -05:00
  • 747e1eafcf Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vulkan Jared Van Bortel 2023-12-13 17:25:15 -05:00
  • 27631dbb6e separate shaders from kompute itself Jared Van Bortel 2023-12-13 17:22:19 -05:00
  • 3e09e127eb rename ggml-vulkan -> ggml-kompute Jared Van Bortel 2023-12-13 17:10:32 -05:00
  • 56430c3209 relicense Vulkan backend as MIT Jared Van Bortel 2023-12-13 16:54:06 -05:00
  • 948ff137ec server : fix handling of characters that span multiple tokens when streaming (#4446) b1634 shibe2 2023-12-13 23:57:15 +04:00
  • 4d98d9a656 sync : ggml (SD ops, tests, kernels) (#4444) b1633 Georgi Gerganov 2023-12-13 21:54:54 +02:00
  • 70f806b821 build : detect host compiler and cuda compiler separately (#4414) b1632 Jared Van Bortel 2023-12-13 12:10:10 -05:00
  • c8554b80be Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/fix-cuda-warning-flags ceb/fix-cuda-warning-flags Jared Van Bortel 2023-12-13 12:06:01 -05:00
  • d870a9fd2c get_flags.mk -> get-flags.mk Jared Van Bortel 2023-12-13 12:05:01 -05:00
  • 9fb13f9584 common : add --version option to show build info in CLI (#4433) b1631 Siwen Yu 2023-12-13 20:50:14 +08:00
  • 113f9942fc readme : update hot topics Georgi Gerganov 2023-12-13 14:05:38 +02:00
  • 799a1cb13b llama : add Mixtral support (#4406) b1629 slaren 2023-12-13 13:04:25 +01:00
  • e1241d9b46 metal : switch to execution barriers + fix one of the barriers mixtral Georgi Gerganov 2023-12-13 13:56:45 +02:00
  • 109e7aa8ac metal : limit kernels to not use more than the allowed threads Georgi Gerganov 2023-12-13 10:55:17 +02:00
  • ab558ac2b3 metal : fix soft_max kernels Georgi Gerganov 2023-12-13 10:54:17 +02:00
  • 82e4f64578 convert-hf : support for mixtral-instruct (#4428) Radek Pilar 2023-12-12 20:04:10 +01:00
  • 90c12e6b3c ggml : do not use BLAS with ggml_mul_mat_id Georgi Gerganov 2023-12-12 20:05:58 +02:00
  • cacac25195 cmake : fix improper joining in generator expression Jared Van Bortel 2023-12-12 11:30:57 -05:00
  • cdf3cc3c17 cmake : make CUDA warning stuff properly conditional Jared Van Bortel 2023-12-12 11:27:41 -05:00
  • e30a8ad1ee cmake : capitalize variables Jared Van Bortel 2023-12-12 11:23:04 -05:00
  • b5b2cdff1d cmake : fix incorrect variable reference Jared Van Bortel 2023-12-12 11:19:18 -05:00