Commit Graph

  • a0b3ac8c48 ggml : introduce GGML_CALL function annotation (#4850) b1882 Justine Tunney 2024-01-16 03:16:33 -08:00
  • d75c232e1d finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) b1881 Daniel Bevenius 2024-01-16 12:14:19 +01:00
  • e0324285a5 speculative : threading options (#4959) b1880 stduhpf 2024-01-16 12:04:32 +01:00
  • bb9abb5cd8 imatrix: guard Q4_0/Q5_0 against ffn_down craziness ik/imatrix_legacy_quants Iwan Kawrakow 2024-01-16 09:56:05 +02:00
  • 6f9ec42a27 imatrix: adding support for legacy quants Iwan Kawrakow 2024-01-16 08:37:56 +02:00
  • 3e5ca7931c pass cpu-architecture arguments only to host code (C;C++) (#4943) b1879 ngc92 2024-01-15 20:40:48 +02:00
  • 0b2fca9a9f imatrix : offload to GPU support Georgi Gerganov 2024-01-15 16:18:11 +02:00
  • e0493800ce simple : fix Georgi Gerganov 2024-01-15 16:43:46 +02:00
  • e1b1db9f09 simple : do not perform tensor data copy if not needed Georgi Gerganov 2024-01-15 16:42:16 +02:00
  • 83f3d7a83c backend : clean-up the implementation Georgi Gerganov 2024-01-15 15:52:41 +02:00
  • 01b6f68a00 backend : group nodes in a single compute when user don't need them Georgi Gerganov 2024-01-14 17:30:22 +02:00
  • 65648b341f backend : add eval callback Georgi Gerganov 2024-01-14 16:48:16 +02:00
  • 4483396751 llama : apply classifier-free guidance to logits directly (#4951) b1878 David Friehs 2024-01-15 14:06:52 +01:00
  • d9aa4ffa6e awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
  • ddb008d845 cuda : fix dequantize kernel names (#4938) b1876 Georgi Gerganov 2024-01-15 13:27:00 +02:00
  • 2faaef3979 llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) b1875 Kawrakow 2024-01-15 10:09:38 +02:00
  • 4a3156de2f CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) b1874 Kawrakow 2024-01-15 07:48:06 +02:00
  • a836c8f534 llama : fix missing quotes (#4937) b1873 David Pflug 2024-01-14 10:46:00 -05:00
  • 467a882fd2 Add ability to use importance matrix for all k-quants (#4930) b1872 Kawrakow 2024-01-14 16:21:12 +02:00
  • bb0c139247 llama : check LLAMA_TRACE env for extra logging (#4929) b1871 Georgi Gerganov 2024-01-14 13:26:53 +02:00
  • 9408cfdad6 scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
  • 03c5267490 llama : use LLAMA_LOG_ macros for logging b1869 Georgi Gerganov 2024-01-14 11:03:19 +02:00
  • a128c38de8 Fix ffn_down quantization mix for MoE models (#4927) b1868 Kawrakow 2024-01-14 10:53:39 +02:00
  • 5f5fe1bd60 metal : correctly set SIMD support flags on iOS (#4923) b1867 Alex Azarov 2024-01-14 09:44:39 +01:00
  • ac32902a87 llama : support WinXP build with MinGW 8.1.0 (#3419) b1866 Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
  • 147b17ac94 2-bit quantizations (#4897) b1865 Kawrakow 2024-01-14 09:45:56 +02:00
  • 807179ec58 Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) b1864 Kawrakow 2024-01-14 09:44:30 +02:00
  • 76484fbfd3 sync : ggml b1863 Georgi Gerganov 2024-01-14 00:14:46 +02:00
  • c71d608ce7 ggml: cache sin/cos for RoPE (#4908) b1862 Johannes Gäßler 2024-01-13 21:41:37 +01:00
  • 4be5ef556d metal : remove old API (#4919) b1861 Georgi Gerganov 2024-01-13 20:45:45 +02:00
  • 0ea069b87b server : fix prompt caching with system prompt (#4914) b1860 Georgi Gerganov 2024-01-13 19:31:26 +02:00
  • f172de03f1 llama : fix detokenization of non-special added-tokens (#4916) b1859 Georgi Gerganov 2024-01-13 18:47:38 +02:00
  • 2d57de5255 metal : disable log for loaded kernels (#4794) b1858 Georgi Gerganov 2024-01-13 18:46:37 +02:00
  • df845cc982 llama : minimize size used for state save/load (#4820) b1857 David Friehs 2024-01-13 17:29:43 +01:00
  • 6b48ed0893 workflows: unbreak nix-build-aarch64, and split it out (#4915) b1856 Someone 2024-01-13 16:29:16 +00:00
  • 722d33f34e main : add parameter --no-display-prompt (#4541) b1855 Yann Follet 2024-01-14 00:09:08 +08:00
  • c30b1ef39a gguf : fix potential infinite for-loop (#4600) b1854 texmex76 2024-01-13 17:06:20 +01:00
  • b38b5e93ae metal : refactor kernel loading code (#4794) b1853 Georgi Gerganov 2024-01-13 18:03:45 +02:00
  • 7dc78764e2 compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
  • 356327feb3 server : fix deadlock that occurs in multi-prompt scenarios (#4905) b1851 Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
  • ee8243adaa server : fix crash with multimodal models without BOS token (#4904) b1850 makomk 2024-01-13 14:16:11 +00:00
  • 9998ecd191 llama : add phixtral support (wip) gg/add-phixtral Georgi Gerganov 2024-01-13 14:19:13 +02:00
  • 15ebe59210 convert : update phi-2 to latest HF repo (#4903) b1849 Georgi Gerganov 2024-01-13 13:44:37 +02:00
  • 1fb563ebdc py : try to fix flake stuff gg/update-phi2-convert Georgi Gerganov 2024-01-13 13:34:08 +02:00
  • fe252237a3 convert : update phi-2 to latest HF repo Georgi Gerganov 2024-01-12 22:48:47 +02:00
  • de473f5f8e sync : ggml b1848 Georgi Gerganov 2024-01-12 22:02:43 +02:00
  • f238461236 ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
  • fa5c1fb44a backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
  • 52ee4540c0 examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
  • 3fe81781e3 CUDA: faster q8_0 -> f16 dequantization (#4895) b1844 Johannes Gäßler 2024-01-12 20:38:54 +01:00
  • e7e4df031b llama : ggml-backend integration (#4766) b1843 slaren 2024-01-12 20:07:38 +01:00
  • 584d674be6 llama : remove redundant assert for StableLM (#4901) b1842 Georgi Gerganov 2024-01-12 20:54:12 +02:00
  • 930f907d3e export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) b1841 Daniel Bevenius 2024-01-12 18:54:53 +01:00
  • e790eef21c llama.swiftui : update models layout (#4826) b1840 Zay 2024-01-12 05:48:00 -07:00
  • 5537d9d36b gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
  • 1b280c9fff CUDA: fix softmax compile for old CUDA versions (#4862) b1838 Johannes Gäßler 2024-01-12 12:30:41 +01:00
  • 3cabe80630 llama : fix typo "imp_embd" -> "inp_embd" b1837 Georgi Gerganov 2024-01-12 13:10:19 +02:00
  • 4315a94366 common : streamline the formatting of help (#4890) b1836 howlger 2024-01-12 12:05:32 +01:00
  • 2d00741e12 py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00
  • f445c0e68c llama : fix llm_build_k_shift to use correct n_rot (#4889) b1834 Georgi Gerganov 2024-01-12 13:01:56 +02:00
  • 326b418b59 Importance Matrix calculation (#4861) b1833 Kawrakow 2024-01-12 06:59:57 +01:00
  • 1d118386fe server : fix infill when prompt is empty (#4833) b1832 Georgi Gerganov 2024-01-11 23:23:49 +02:00
  • 7edefbd79c main : better name for variable n_print (#4874) b1831 Georgi Gerganov 2024-01-11 22:46:26 +02:00
  • 3ca63b4538 main : disable token count by default (#4874) b1830 Georgi Gerganov 2024-01-11 22:43:05 +02:00
  • b037787548 swift : track ggml release branch (#4867) b1829 Georgi Gerganov 2024-01-11 21:58:28 +02:00
  • 469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872) b1828 Kawrakow 2024-01-11 20:43:15 +01:00
  • 49662cbed3 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) b1827 Kawrakow 2024-01-11 20:39:39 +01:00
  • 3ba5b8ca8e swift : pin ggml commit + remove ggml.h from spm-headers (#4878) b1826 Georgi Gerganov 2024-01-11 21:31:31 +02:00
  • 4330bd83fe server : implement credentialed CORS (#4514) b1825 Laura 2024-01-11 19:02:48 +01:00
  • 27379455c3 server : support for multiple api keys (#4864) b1824 Michael Coppola 2024-01-11 12:51:17 -05:00
  • eab6795006 server : add LOG_INFO when model is successfully loaded (#4881) b1823 Behnam M 2024-01-11 12:41:39 -05:00
  • d8d90aa343 ci: nix-flake-update: new token with pr permissions (#4879) b1822 Someone 2024-01-11 17:22:34 +00:00
  • 9bfcb16fd3 Add llama enum for IQ2_XS ik/iq2_2.31bpw Iwan Kawrakow 2024-01-11 18:24:12 +02:00
  • 43f76bf1c3 main : print total token count and tokens consumed so far (#4874) b1821 pudepiedj 2024-01-11 16:14:52 +00:00
  • 2f043328e3 server : fix typo in model name (#4876) b1820 Isaac McFadyen 2024-01-11 09:33:26 -05:00
  • 2a7c94db5f metal : put encoder debug group behind a define (#4873) b1819 Paul Tsochantaris 2024-01-11 14:31:52 +00:00
  • 64802ec00d sync : ggml b1818 Georgi Gerganov 2024-01-11 09:39:08 +02:00
  • 3267c2abc7 metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
  • f85a973aa1 ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
  • 5362e43962 metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
  • e739de7909 ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
  • c910e3c28a Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
  • f34432ca1e fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
  • 7a9f75c38b server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
  • 5c1980d8d4 server : fix build + rename enums (#4870) b1810 Georgi Gerganov 2024-01-11 09:10:34 +02:00
  • 50579f27e9 attempt to get test-backend-ops working Jared Van Bortel 2024-01-10 16:14:03 -05:00
  • cd108e641d server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
  • 8a99f69895 fix assertion failure Jared Van Bortel 2024-01-10 13:44:34 -05:00
  • d5670d6e46 kompute : initial attempt at ggml-backend v2 support Jared Van Bortel 2024-01-09 16:24:10 -05:00
  • 1eb8804c18 PR #4766 Jared Van Bortel 2024-01-10 11:29:04 -05:00
  • 3773e1afe7 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/nomic-vulkan Jared Van Bortel 2024-01-09 16:37:08 -05:00
  • ae6d6824b7 Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomic-vulkan Jared Van Bortel 2024-01-09 16:34:46 -05:00
  • 904c563dbc sync xxd commands with GPT4All llama.cpp.cmake Jared Van Bortel 2024-01-10 12:12:59 -05:00
  • 57d016ba2d llama : add additional suffixes for model params (#4834) b1808 Brian 2024-01-11 01:09:53 +11:00
  • 329ff61569 llama : recognize 1B phi models (#4847) b1807 Austin 2024-01-10 08:39:09 -05:00
  • d34633d8db clip : support more quantization types (#4846) b1806 John 2024-01-10 14:37:09 +01:00
  • a1610b05b2 iq2_xs: had forgotten to delete iq2-data.h Iwan Kawrakow 2024-01-10 13:47:42 +02:00
  • 8299b03a99 iq2_xs: faster AVX2 dit product Iwan Kawrakow 2024-01-10 11:33:23 +02:00
  • 3198e94f00 iq2_xs: AVX2 dot product - 19.5 t/s Iwan Kawrakow 2024-01-10 08:49:38 +02:00
  • 4f56458d34 Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00