Commit Graph

  • adf3de4f69 ggml : fix soft max out-of-bounds access (#4307) b1605 Georgi Gerganov 2023-12-03 15:56:22 +02:00
  • 33e171d1e9 server : fix OpenAI API stop field to be optional (#4299) b1604 Ed Lee 2023-12-03 01:10:43 -08:00
  • 6949b50df5 py : add grammar to oai like api (#4294) Rickard Edén 2023-12-03 10:03:25 +01:00
  • d7b800b8bc llama : pad KV cache size (#4280) b1602 Georgi Gerganov 2023-12-03 10:58:16 +02:00
  • 3cb1c348b3 metal : try to improve batched decoding gg/pad-kv-cache Georgi Gerganov 2023-12-01 21:47:42 +02:00
  • 3e68df8616 llama : pad KV cache size to 32 Georgi Gerganov 2023-12-01 10:55:27 +02:00
  • 5a7d3125e7 llama : avoid using "optional" keyword (#4283) b1601 Georgi Gerganov 2023-12-01 20:39:12 +02:00
  • d5a1cbde60 llama : support optional tensors (#4283) b1600 Georgi Gerganov 2023-12-01 20:35:03 +02:00
  • b220222a64 swift : fix token_to_piece implementation (#4278) b1599 Miwa / Ensan 2023-12-02 03:19:45 +09:00
  • 511f52c334 build : enable libstdc++ assertions for debug builds (#4275) b1598 Jared Van Bortel 2023-12-01 13:18:35 -05:00
  • 03562f3a86 llama : support attention bias on LLaMA architecture (#4283) b1597 CausalLM 2023-12-02 02:17:06 +08:00
  • 37c746d687 llama : add Qwen support (#4281) b1596 Shijie 2023-12-02 02:16:31 +08:00
  • 880f57973b llama : fix integer overflow during quantization (#4284) b1595 Georgi Gerganov 2023-12-01 18:42:11 +02:00
  • 8d6d9f033b py : add requirements file for convert-hf-to-gguf.py (#4277) Daniel Bevenius 2023-12-01 10:41:56 +01:00
  • ef47ec18da ggml : add ggml_soft_max_ext (#4256) b1593 Georgi Gerganov 2023-12-01 10:51:24 +02:00
  • eb594c0f7d alloc : fix build with debug gg/soft-max-ext Georgi Gerganov 2023-12-01 10:45:54 +02:00
  • d9c8fa3bce metal : simplify soft max kernel Georgi Gerganov 2023-12-01 10:31:21 +02:00
  • 5b74310e6e build : enable libstdc++ assertions for debug builds ceb/libstdcpp-assertions Jared Van Bortel 2023-11-30 18:09:23 -05:00
  • 1d144112c0 server : add --log-disable to disable logging to file (#4260) b1592 Ziad Ben Hadj-Alouane 2023-11-30 17:25:49 -05:00
  • f43f09366d server : add single-client multi-prompt support (#4232) b1591 Ziad Ben Hadj-Alouane 2023-11-30 17:25:04 -05:00
  • d2809a3ba2 make : fix Apple clang determination bug (#4272) b1590 WillCorticesAI 2023-11-30 17:23:44 -05:00
  • 15f5d96037 build : fix build info generation and cleanup Makefile (#3920) b1589 Jared Van Bortel 2023-11-30 17:23:08 -05:00
  • 33c9892af5 llava : ShareGPT4V compatibility (vision encoder only loading) (#4172) John 2023-11-30 23:11:14 +01:00
  • 8efa0f6ebe main : pass LOG_TEE callback to llama.cpp log (#4033) b1587 Andrew Godfrey 2023-11-30 13:56:19 -08:00
  • 524907aa76 readme : fix (#4135) vodkaslime 2023-12-01 05:49:21 +08:00
  • 3bd2c7ce1b docker : add finetune option (#4211) Juraj Bednar 2023-11-30 22:46:01 +01:00
  • bde629bb53 batched.swift : update README.md (#4214) Miwa / Ensan 2023-12-01 06:45:17 +09:00
  • f7f9e06212 cmake : fix the metal file foder path (#4217) b1583 Li Tan 2023-11-30 13:44:11 -08:00
  • 74daabae69 readme : fix typo (#4253) Dawid Wysocki 2023-11-30 22:43:32 +01:00
  • b18c66ca6e llama : fix alignment of general.name in print meta (#4254) b1581 Daniel Bevenius 2023-11-30 22:43:08 +01:00
  • f4d973cecb convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) slaren 2023-11-30 22:42:23 +01:00
  • 954e22858c llama : fix typical sampling (#4261) b1579 tarcey 2023-11-30 22:40:23 +01:00
  • e2bd725f4b py : fix oai proxy (#3972) rhjdvsgsgks 2023-11-30 20:50:40 +00:00
  • c4db59230d metal : warp-based reduce for rms_norm Georgi Gerganov 2023-11-30 22:21:30 +02:00
  • 55717c98c4 metal : warp-based reduction for soft max kernel Georgi Gerganov 2023-11-30 21:52:32 +02:00
  • 68e02c0d58 cuda : fix warp reduction initialization of shared mem Georgi Gerganov 2023-11-30 21:39:48 +02:00
  • 6b86bcffac cuda : increase max block size to 1024 Georgi Gerganov 2023-11-30 20:40:47 +02:00
  • 62532c05aa cuda : do warp-based block reduce Georgi Gerganov 2023-11-30 20:36:08 +02:00
  • c7c8dabcf7 ggml : update soft max cpu Georgi Gerganov 2023-11-30 20:05:41 +02:00
  • ebd062bc19 cuda : use 512 threads for soft_max instead of 32 Georgi Gerganov 2023-11-30 17:19:29 +02:00
  • 580fe2064c metal : simplify soft_max encoding Georgi Gerganov 2023-11-29 17:30:19 +02:00
  • 390a445906 batched-bench : print threads Georgi Gerganov 2023-11-29 17:26:12 +02:00
  • 6a66f69f9f ggml : implement soft_max_ext (CPU) Georgi Gerganov 2023-11-29 17:07:07 +02:00
  • 88519fbf97 cuda : implement soft_max_ext Georgi Gerganov 2023-11-29 15:34:20 +02:00
  • e89597c062 metal : implement soft_max_ext Georgi Gerganov 2023-11-29 12:44:47 +02:00
  • 1f5cd83275 examples : add readme files Georgi Gerganov 2023-11-29 11:00:17 +02:00
  • 4fea3420ee readme : add FreeChat (#4248) Peter Sugihara 2023-11-28 23:16:34 -08:00
  • 64e64aa255 ggml : restore abort() in GGML_ASSERT (#4242) b1575 Jared Van Bortel 2023-11-28 04:51:11 -05:00
  • 8406b0924b ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240) b1574 Georgi Gerganov 2023-11-28 10:32:03 +02:00
  • bb39b87964 ggml : restore abort() in GGML_ASSERT assert-restore-abort Jared Van Bortel 2023-11-27 19:27:09 -05:00
  • 87f4102a70 llama : revert n_threads_batch logic gg/fix-cpu-blas Georgi Gerganov 2023-11-27 21:21:23 +02:00
  • b38a16dfcf cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970) b1573 bandoti 2023-11-27 15:25:42 -04:00
  • e9b7a5cbd0 llama : use n_threads_batch only when n_tokens >= 32 Georgi Gerganov 2023-11-27 20:48:44 +02:00
  • f815fe43d3 ggml : use blas even if src0 is not F32 Georgi Gerganov 2023-11-27 20:48:27 +02:00
  • 6272b6764a use stride=128 if built for tensor cores ceb/perf-faster-multigpu Jared Van Bortel 2023-11-27 13:09:14 -05:00
  • dd71a35cc8 make MUL_MAT_SRC1_COL_STRIDE conditional on runtime mmq Jared Van Bortel 2023-11-27 13:05:55 -05:00
  • 0dab8cd7cc readme : add Amica to UI list (#4230) Kasumi 2023-11-28 01:39:42 +08:00
  • bb03290c17 examples : iOS example with swift ui (#4159) b1571 Bailey Chittle 2023-11-27 09:56:52 -05:00
  • c830a0537b Merge branch 'master' into cuda-cublas-opts Georgi Gerganov 2023-11-27 11:49:14 +02:00
  • f3b269813f ggml : fix -Warray-bounds warning with gcc (#4231) b1570 Jared Van Bortel 2023-11-26 22:58:43 -05:00
  • 12fb1c58ec cuda : tweak mm stride to double perf on P40 + GTX 970 Jared Van Bortel 2023-11-26 22:20:18 -05:00
  • 3e73d31d9c lookahead : support -n -1 infinite generation b1569 Georgi Gerganov 2023-11-26 21:51:46 +02:00
  • 9656026b53 readme : update hot topics Georgi Gerganov 2023-11-26 20:42:51 +02:00
  • 922754a8d6 lookahead : add example for lookahead decoding (#4207) b1567 Georgi Gerganov 2023-11-26 20:33:07 +02:00
  • 8d8b76d469 lookahead : add comments lookahead Georgi Gerganov 2023-11-26 11:26:55 +02:00
  • 1a07a33939 lookahead : fix a bug in the seq_id of the lookahead tokens Georgi Gerganov 2023-11-26 11:26:43 +02:00
  • 22da05536f metal : fix yarn (#4220) b1566 Xiao-Yong Jin 2023-11-26 02:30:02 -06:00
  • 7d50de2de1 lookahead : add to Makefile slaren 2023-11-26 08:33:11 +01:00
  • 1ddb52ec38 scripts : Use mmap in torch load (#4202) Galunid 2023-11-25 22:45:02 +01:00
  • f837c3a992 llama : grammar reserve space in decode_utf8 (#4210) b1564 Marcus Dunn 2023-11-25 08:58:23 -08:00
  • 3014b5415d Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189) b1563 crasm 2023-11-25 10:47:07 -05:00
  • 7bd1cd7ef4 lookahead : use deterministic init Georgi Gerganov 2023-11-25 17:12:16 +02:00
  • 6eb5166e5a lookahead : filter repeating n-grams Georgi Gerganov 2023-11-25 17:02:56 +02:00
  • 61d039727a lookahead : initial working implementation Georgi Gerganov 2023-11-25 16:25:38 +02:00
  • 1b2e0bc3e6 lookahead : use loop instead recursion to generate n-grams Georgi Gerganov 2023-11-25 13:58:41 +02:00
  • eb03b9ad69 lookahead : generate and store n-grams Georgi Gerganov 2023-11-25 13:54:07 +02:00
  • 04814e718e readme : update hot topics Georgi Gerganov 2023-11-25 12:02:13 +02:00
  • af19d35734 server : OAI API compatibility (#4198) b1561 Georgi Gerganov 2023-11-25 11:29:06 +02:00
  • 7c517e1722 lookahead : init Georgi Gerganov 2023-11-24 16:47:21 +02:00
  • e9c13ff781 llama : set metal log callback correctly (#4204) b1560 slaren 2023-11-24 18:10:01 +01:00
  • 8a052c131e ggml-cuda : support stablelm rope (#4156) b1559 slaren 2023-11-24 18:04:31 +01:00
  • 21b70babf7 straightforward /v1/models endpoint server-oai-compat Tobi Lütke 2023-11-24 11:22:39 -05:00
  • 189d68446e convert : fix tensors using grad in some models (#4173) Galunid 2023-11-24 15:02:49 +01:00
  • b61631426b server : change random string generator Georgi Gerganov 2023-11-24 11:39:03 +02:00
  • b3e88bf494 server : minor code style Georgi Gerganov 2023-11-24 11:33:49 +02:00
  • 2568a4bf54 main.swift : fix eos checking (#4197) b1557 eastriver 2023-11-24 18:25:10 +09:00
  • c544faed74 server : enable special tokens during tokenization by default Georgi Gerganov 2023-11-24 11:10:23 +02:00
  • b94b10914c server : indentation Georgi Gerganov 2023-11-24 11:00:15 +02:00
  • 80724eb0e1 Merge branch 'master' into server-oai-compat Georgi Gerganov 2023-11-24 10:54:08 +02:00
  • f25308be5c server : some style changes Georgi Gerganov 2023-11-24 10:49:08 +02:00
  • b35f3d0def readme : use PATH for Windows ROCm (#4195) Aaryaman Vasishta 2023-11-24 16:52:39 +09:00
  • 9ae88baf38 Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo Jared Van Bortel 2023-11-23 13:05:04 -05:00
  • a4bb9c5ced vulkan : sync with "migrate to dynamic graphs" Jared Van Bortel 2023-11-23 12:20:07 -05:00
  • 23f6d51f68 Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vulkan Jared Van Bortel 2023-11-23 12:12:38 -05:00
  • 208cd52f7d vulkan : implement YaRN RoPE scaling (#2268) Jared Van Bortel 2023-11-15 17:58:19 -05:00
  • 1829f1d7be Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-vulkan Jared Van Bortel 2023-11-23 11:45:46 -05:00
  • 02c3309f6d merge fixup (e16b9fa4ba) Jared Van Bortel 2023-11-14 15:54:26 -05:00
  • 9c4dfd06e8 mention skipped change Jared Van Bortel 2023-11-15 15:51:55 -05:00
  • fe26e6adff Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vulkan Jared Van Bortel 2023-11-14 13:55:30 -05:00
  • 6474fc879a vulkan : handle ggml_scale for n%8 != 0 Jared Van Bortel 2023-11-14 12:10:52 -05:00