Commit Graph

  • a5c088d8c6 flake.nix: rocm not yet supported on aarch64, so hide the output Someone Serge 2023-12-26 23:34:40 +00:00
  • 1e3900ebac flake.nix: expose full scope in legacyPackages Someone Serge 2023-12-29 16:15:37 +00:00
  • 5865b18eeb metal : fix mat-vec Q4_K kernel for QK_K == 64 Georgi Gerganov 2023-12-31 13:52:34 +02:00
  • a8b9bb4566 cmake : respect LLAMA_QKK_64 option Georgi Gerganov 2023-12-31 13:34:07 +02:00
  • 049a32fffa metal : normalize mat-vec kernel signatures Georgi Gerganov 2023-12-31 12:31:26 +02:00
  • ad7cf37fe8 metal : fix mat-vec Q8_0 kernel for BS > 1 Georgi Gerganov 2023-12-31 12:26:21 +02:00
  • 6435a3de31 cmake : rename option to LLAMA_METAL_SHADER_DEBUG Georgi Gerganov 2023-12-31 12:18:48 +02:00
  • 4c054d98d4 metal : use uint64_t for strides Georgi Gerganov 2023-12-31 12:07:58 +02:00
  • b14b5a9eb3 metal : fix compile warnings Georgi Gerganov 2023-12-31 12:04:05 +02:00
  • e39106c055 ggml : add ggml_vdotq_s32 alias (#4715) b1732 Georgi Gerganov 2023-12-31 11:43:31 +02:00
  • 9fbda719de clip : refactor + bug fixes (#4696) b1731 Georgi Gerganov 2023-12-30 23:24:42 +02:00
  • 1580805fc6 metal : fix API debug warnings Georgi Gerganov 2023-12-30 21:10:32 +02:00
  • a184e1050c cmake : add -fno-inline for Metal build (#4545) Georgi Gerganov 2023-12-30 21:10:13 +02:00
  • 515cfec44f metal : fix Metal API debug warnings Georgi Gerganov 2023-12-30 20:34:53 +02:00
  • 75c14f2608 ggml : disable fast-math for Metal (cmake build only) Georgi Gerganov 2023-12-30 19:33:01 +02:00
  • f64e4f04e7 ggml : testing GPU FP precision via quantized CPY gg/gpu-prec-tests Georgi Gerganov 2023-12-30 13:22:57 +02:00
  • 39d8bc71ed CUDA: fixed tensor cores not being used on RDNA3 (#4697) b1730 Johannes Gäßler 2023-12-30 13:52:01 +01:00
  • 24a447e20a ggml : add ggml_cpu_has_avx_vnni() (#4589) b1729 automaticcat 2023-12-30 15:07:48 +07:00
  • a20f3c7465 CUDA: fix tensor core logic for Pascal and HIP (#4682) b1728 Johannes Gäßler 2023-12-29 23:12:53 +01:00
  • 0235b9b571 clip : use ggml_backend_buffer_is_host (#4205) b1727 Georgi Gerganov 2023-12-29 18:53:34 +02:00
  • ce18d727a4 clip : enable gpu backend (#4205) b1726 Steward Garcia 2023-12-29 11:52:15 -05:00
  • 91bb39cec7 cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687) b1725 hydai 2023-12-30 00:31:19 +08:00
  • 04ac0607e9 python : add check-requirements.sh and GitHub workflow (#4585) b1724 crasm 2023-12-29 09:50:29 -05:00
  • 68eccbdc5b flake.nix : rewrite (#4605) b1723 Philip Taron 2023-12-29 06:42:26 -08:00
  • 97bbca6e85 cmake : fix ld warning duplicate libraries libllama.a (#4671) b1722 Cuong Trinh Manh 2023-12-29 21:39:15 +07:00
  • 4af4801566 llava-cli : refactor to use sampling library (#4669) b1721 Justine Tunney 2023-12-29 06:38:38 -08:00
  • db49ff8ed7 server : replace sleep with condition variables (#4673) b1720 Justine Tunney 2023-12-29 06:24:12 -08:00
  • 60f55e888c server : fix OpenAI server sampling w.r.t. penalty. (#4675) b1719 SakuraUmi 2023-12-29 22:22:44 +08:00
  • b93edd22f5 server : allow to generate multimodal embeddings (#4681) b1718 Karthik Sethuraman 2023-12-29 06:22:10 -08:00
  • 82d6eab224 main-cmake-pkg : fix build issue (#4665) b1717 andrijdavid 2023-12-29 15:18:20 +01:00
  • afd997ab60 llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674) b1716 Peter Sugihara 2023-12-29 05:58:56 -08:00
  • c8255f8a6b scripts : print list of sync commits b1715 Georgi Gerganov 2023-12-29 15:12:35 +02:00
  • 441f51dca0 ci : build with CLBlast + ggml-opencl use GGML_API (whisper/1576) Tamotsu Takahashi 2023-12-29 19:23:27 +09:00
  • 38b3de4658 sync : ggml b1713 Georgi Gerganov 2023-12-29 14:56:41 +02:00
  • afc8c19291 ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669) bssrdf 2023-12-29 03:32:31 -05:00
  • ca38b8d334 scripts : do not sync commits from this repo Georgi Gerganov 2023-12-29 14:41:36 +02:00
  • 65e5f6dadb Fix OpenAI server sampling w.r.t. temp and seed (#4668) b1710 Justine Tunney 2023-12-28 11:20:00 -08:00
  • ea5497df5d gpt2 : Add gpt2 architecture integration (#4555) b1709 manikbhandari 2023-12-28 09:03:57 -05:00
  • f6793491b5 llama : add AWQ for llama, llama2, mpt, and mistral models (#4593) b1708 Nam D. Tran 2023-12-27 22:39:45 +07:00
  • 879b690a9e finetune : fix output formatting in print_params (#4653) b1707 Daniel Bevenius 2023-12-27 15:16:55 +01:00
  • b47879b0dd scripts : add sync-ggml-am.sh Georgi Gerganov 2023-12-27 11:15:31 +02:00
  • 951010fa53 ggml : fix dot product for ARM (#4630) b1705 Georgi Gerganov 2023-12-27 11:02:13 +02:00
  • f56d6077d0 Add byte token type when tokenizer.model is not exists (#4641) wonjun Jang 2023-12-27 17:37:25 +09:00
  • dc68f0054c cuda : fix vmm pool with multi GPU (#4620) b1703 slaren 2023-12-26 21:23:59 +01:00
  • f32f30bc57 test gg/test-arm Georgi Gerganov 2023-12-26 17:37:33 +02:00
  • de8e496437 Update comment for AdamW implementation reference. (#4604) b1702 WillCorticesAI 2023-12-26 05:42:08 -05:00
  • 77465dad48 Fix new CUDA10 compilation errors (#4635) b1701 FantasyGmm 2023-12-26 18:38:36 +08:00
  • a206137f92 Adding Emeltal reference to UI list (#4629) b1700 Paul Tsochantaris 2023-12-25 16:09:53 +00:00
  • b9f47952ff simplify bug issue template (#4623) b1699 slaren 2023-12-24 21:01:12 +01:00
  • 753be377b6 llama : add PLaMo model (#3557) b1698 Shintarou Okada 2023-12-24 22:35:49 +09:00
  • 5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606) b1697 slaren 2023-12-24 14:34:22 +01:00
  • 708e179e85 fallback to CPU buffer if host buffer alloc fails (#4610) b1696 slaren 2023-12-23 16:10:51 +01:00
  • 925e5584a0 ci(docker): fix tags in "Build and push docker image (tagged)" (#4603) b1695 Samuel Maynard 2023-12-23 11:35:55 +02:00
  • 6123979952 server : allow to specify custom prompt for penalty calculation (#3727) b1694 Alexey Parfenov 2023-12-23 09:31:49 +00:00
  • b9ec82d262 grammar : check the full vocab only if necessary (opt) (#4306) b1693 kalomaze 2023-12-23 03:27:07 -06:00
  • e0a4002273 CUDA: fixed row rounding for 0 tensor splits (#4594) b1692 Johannes Gäßler 2023-12-23 09:16:33 +01:00
  • 7082d24cec lookup : add prompt lookup decoding example (#4484) b1691 LeonEricsson 2023-12-22 17:05:56 +01:00
  • ba66175132 sync : ggml (fix im2col) (#4591) b1690 Georgi Gerganov 2023-12-22 17:53:43 +02:00
  • a55876955b cuda : fix jetson compile error (#4560) b1689 FantasyGmm 2023-12-22 23:11:12 +08:00
  • 6724ef1657 Fix CudaMemcpy direction (#4599) b1688 Henrik Forstén 2023-12-22 15:34:05 +02:00
  • 48b7ff193e llama : fix platforms without mmap (#4578) b1687 slaren 2023-12-22 12:12:53 +01:00
  • 48b24b170e ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) b1686 Herman Semenov 2023-12-22 09:26:49 +00:00
  • 28cb35a0ec make : add LLAMA_HIP_UMA option (#4587) b1685 Michael Kesper 2023-12-22 09:03:25 +01:00
  • f31b984898 ci : tag docker image with build number (#4584) b1684 rhuddleston 2023-12-21 23:56:34 -07:00
  • 2bb98279c5 readme : add zig bindings (#4581) Deins 2023-12-22 08:49:54 +02:00
  • 0137ef88ea ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (#4579) b1682 bobqianic 2023-12-22 06:47:01 +00:00
  • c7e9701f86 llama : add ability to cancel model loading (#4462) b1681 crasm 2023-12-22 01:19:36 -05:00
  • afefa319f1 ggml : change ggml_scale to take a float instead of tensor (#4573) b1680 Georgi Gerganov 2023-12-21 23:20:49 +02:00
  • 769a7bc85e gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
  • 32259b2dad gguf : simplify example dependencies b1678 Georgi Gerganov 2023-12-21 23:07:58 +02:00
  • 4a5f9d629e ci : add jlumbroso/free-disk-space to docker workflow (#4150) b1677 Samuel Maynard 2023-12-21 22:36:26 +02:00
  • ab1b75166f Merge branch 'master' into gg/ggml_scale gg/ggml_scale Georgi Gerganov 2023-12-21 22:35:11 +02:00
  • d232aca5a7 llama : initial ggml-backend integration (#4520) b1676 slaren 2023-12-21 21:07:46 +01:00
  • 31f27758fa llama : allow getting n_batch from llama_context in c api (#4540) b1675 Marcus Dunn 2023-12-21 11:57:48 -08:00
  • 56fa50819f metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
  • 0f630fbc92 cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) b1673 Erik Garrison 2023-12-21 13:45:32 -06:00
  • b784f881c3 tests : fix test-grad0 Georgi Gerganov 2023-12-21 21:33:57 +02:00
  • 562cf222b5 ggml-cuda: Fix HIP build by adding define for __trap (#4569) b1672 arlo-phoenix 2023-12-21 20:13:25 +01:00
  • 36c3f41f66 ggml : fix CPU implementation Georgi Gerganov 2023-12-21 21:02:23 +02:00
  • 199f6bdc46 ggml : change ggml_scale to take a float instead of tensor Georgi Gerganov 2023-12-21 20:50:24 +02:00
  • 8fe03ffdda common : remove incorrect --model-draft default (#4568) b1671 Jared Van Bortel 2023-12-21 12:55:34 -05:00
  • 9154494808 CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) b1670 Johannes Gäßler 2023-12-21 18:42:59 +01:00
  • c083718c89 readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
  • 7c87353e61 common : remove incorrect --model-draft default ceb/fix-draft-model-default Jared Van Bortel 2023-12-21 12:17:12 -05:00
  • 880e352277 py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
  • 66f35a2f48 cuda : better error message for ggml_get_rows (#4561) b1667 bobqianic 2023-12-21 17:06:44 +00:00
  • 1398823922 cuda : replace asserts in wrong architecture checks with __trap (#4556) b1666 slaren 2023-12-21 18:02:30 +01:00
  • d3223afdad llama : disable per-tensor info prints on model load (#4562) b1665 Johannes Gäßler 2023-12-21 17:34:17 +01:00
  • 1d7a1912ce Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) b1664 LoganDark 2023-12-21 01:59:27 -08:00
  • 799fc22689 CUDA: Faster Mixtral prompt processing (#4538) b1663 Johannes Gäßler 2023-12-20 15:41:22 +01:00
  • 328b83de23 ggml : fixed check for _MSC_VER (#4535) b1662 Eric Sommerlade 2023-12-19 16:17:01 +00:00
  • a40f6110f0 ggml : force F32 precision for ggml_mul_mat gg/cublas-f32 Georgi Gerganov 2023-12-19 16:23:39 +02:00
  • a7aee47b98 ggml-cuda: Fix HIP build (#4528) b1661 arlo-phoenix 2023-12-18 22:33:45 +01:00
  • 0e18b2e7d0 llama.swiftui : add tinyllama 1.1B F16 b1660 Georgi Gerganov 2023-12-18 20:17:43 +02:00
  • 6ff39b129d llama.swiftui : add more models b1659 Georgi Gerganov 2023-12-18 20:05:12 +02:00
  • b9e74f9bca llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) b1658 Ebey Abraham 2023-12-18 17:27:47 +00:00
  • 3c734f4941 plamo : testing gg/plamo-test Georgi Gerganov 2023-12-18 17:06:05 +02:00
  • 3c04bf6da8 llama : fix try_override for bool_value which always return true (#4519) b1657 hankcs 2023-12-18 05:14:58 -08:00
  • a462159c43 cuda : ggml_cuda_op_mul_mat_cublas support F32 precision gg/phi-2-2 Georgi Gerganov 2023-12-18 14:24:29 +02:00
  • 30338c5643 Update ggml-cuda.cu Georgi Gerganov 2023-12-18 14:21:38 +02:00