Commit Graph

  • 0d2ec43833 llama : support IBM Granite architecture (#9412) b3774 Gabe Goodhart 2024-09-17 00:44:58 -06:00
  • 37f3a3810e llama : add llama_n_head() (#9512) Michael Podvitskiy 2024-09-17 08:23:30 +02:00
  • 23e0d70bac ggml : move common CPU backend impl to new header (#9509) b3772 slaren 2024-09-16 16:22:07 +02:00
  • acb2c32c33 llama : rename n_embed to n_embd in rwkv6_time_mix (#9504) b3771 Daniel Bevenius 2024-09-16 13:07:13 +02:00
  • a6a3a5c531 ggml : link MATH_LIBRARY not by its full path (#9339) b3770 Michael Podvitskiy 2024-09-16 13:06:50 +02:00
  • d54c21df7e convert : identify missing model files (#9397) b3769 compilade 2024-09-16 03:30:22 -04:00
  • 19514d632e cmake : do not hide GGML options + rename option (#9465) Georgi Gerganov 2024-09-16 10:27:50 +03:00
  • 5c3d0f1824 ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422) b3767 Eve 2024-09-16 06:48:24 +00:00
  • 0aadac10c7 llama : support OLMoE (#9462) b3766 Shane A 2024-09-15 23:47:37 -07:00
  • 95ca85168b llama : support MiniCPM3 (#9322) b3765 CarryFun 2024-09-16 14:45:20 +08:00
  • 441b72b91f main : option to disable context shift (#9484) b3764 Vinesh Janarthanan 2024-09-16 01:20:01 -05:00
  • cc1c017191 naming : normalize the name of callback-related identifiers gg/cb-naming Georgi Gerganov 2024-09-16 09:11:42 +03:00
  • c4965a64f7 metal : handle zero-sized allocs (#9466) b3763 Georgi Gerganov 2024-09-16 09:05:56 +03:00
  • 90a2fff0e7 flake.lock: Update (#9488) Georgi Gerganov 2024-09-16 05:14:23 +03:00
  • 6262d13e0b common : reimplement logging (#9418) b3761 Georgi Gerganov 2024-09-15 20:46:12 +03:00
  • e6deac31f7 gguf-split : add basic checks (#9499) b3760 slaren 2024-09-15 19:02:27 +02:00
  • 6988da94a2 cmake : correct order of sycl flags (#9497) b3759 Michael Podvitskiy 2024-09-15 18:55:52 +02:00
  • 73ef3f769c Update llama-server-intel.Dockerfile sycl-cmake-append Meng, Hengyu 2024-09-15 23:21:46 +08:00
  • 3956cf92a9 Update llama-cli-intel.Dockerfile Meng, Hengyu 2024-09-15 23:21:21 +08:00
  • af95b1424f [SYCL] fix cmake broken Meng, Hengyu 2024-09-15 22:57:56 +08:00
  • 3c7989fd29 py : add "LLaMAForCausalLM" conversion support (#9485) b3758 Csaba Kecskemeti 2024-09-15 00:48:25 -07:00
  • d6b37c881f readme : update tools list (#9475) b3757 OSecret 2024-09-15 10:36:53 +03:00
  • 7596487beb cmake : try to fix sycl+intel build (#9487) b3756 Michael Podvitskiy 2024-09-15 09:06:38 +02:00
  • 63ac36b271 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-09-14 16:08:52 -04:00
  • 4bb4b22a58 llama : begin renaming llama_past back to llama_kv_cache Francis Couture-Harpin 2024-09-14 15:00:07 -04:00
  • 822b6322de ggml : ggml_type_name return "NONE" for invalid values (#9458) b3755 Yuri Khrustalev 2024-09-14 05:54:37 -04:00
  • dcdcee3a74 server: add data: [DONE] to /chat/completions stream response (#9459) b3754 VoidIsVoid 2024-09-14 17:36:44 +08:00
  • 1f4111e540 cmake : use list(APPEND ...) instead of set() + dedup linker (#9463) b3753 Georgi Gerganov 2024-09-14 10:55:05 +03:00
  • befaf1197f llama : make cell_id const in inp_s_mask block (#9470) b3752 Daniel Bevenius 2024-09-14 09:50:12 +02:00
  • 8241151f16 set context default to avoid memory issue, update guide arthw 2024-09-14 09:01:05 +08:00
  • fb8f142554 one more CMAKE_CXX_FLAGS fix (#9471) gg/cmake-dedup-link Michael Podvitskiy 2024-09-13 15:13:07 +02:00
  • feff4aa846 server : add loading html page while model is loading (#9468) b3751 Xuan Son Nguyen 2024-09-13 14:23:11 +02:00
  • 228df2bc11 cmake : fix sycl build (#9469) Michael Podvitskiy 2024-09-13 14:11:21 +02:00
  • b653b1e922 cmake : try to fix sycl 2 Georgi Gerganov 2024-09-13 14:05:00 +03:00
  • ae9475de40 cmake : try fix sycl Georgi Gerganov 2024-09-13 12:41:33 +03:00
  • 0abc6a2c25 llama : llama_perf + option to disable timings during decode (#9355) b3750 Georgi Gerganov 2024-09-13 09:53:38 +03:00
  • 19ecca1946 cmake : use list(APPEND ...) instead of set() + dedup linker Georgi Gerganov 2024-09-13 09:44:55 +03:00
  • bd35cb0ae3 feat: remove a sampler from a chain (#9445) b3749 Gilad S. 2024-09-13 04:54:49 +03:00
  • 78203641fe server : Add option to return token pieces in /tokenize endpoint (#9108) b3748 Mathijs Henquet 2024-09-12 22:30:11 +02:00
  • e6b7801bd1 cann: Add host buffer type for Ascend NPU (#9406) b3747 Dou Xinpeng 2024-09-12 19:46:43 +08:00
  • e665744317 llava : fix the script error in MobileVLM README (#9054) b3746 fengerhu1 2024-09-12 19:34:22 +08:00
  • d4c3c10fad lora : raise error if lm_head is ignored (#9103) Xuan Son Nguyen 2024-09-12 13:33:57 +02:00
  • 2a825116b6 cmake : fix for builds without GGML_CDEF_PUBLIC (#9338) b3744 Michael Podvitskiy 2024-09-12 13:30:01 +02:00
  • 4dc4f5f14a ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) b3743 Huang Qi 2024-09-12 19:28:43 +08:00
  • c837981bba py : add Phi-1.5/Phi-2 tokenizer (#9361) daminho 2024-09-12 20:28:20 +09:00
  • 3c26a1644d ci : bump actions/checkout to v4 (#9377) Trivikram Kamat 2024-09-12 04:27:45 -07:00
  • ff76e18516 cmake : fixed the order of linking libraries for llama-quantize (#9450) b3740 Michael Podvitskiy 2024-09-12 13:27:14 +02:00
  • 39f852f440 py : add special tokens in hf_converter for RWKV v6 (#9428) Molly Sophia 2024-09-12 19:25:16 +08:00
  • 2b00fa7997 riscv : modify Makefile and add a RISCV_VECT to print log info (#9442) b3738 Ahmad Tameem 2024-09-12 16:24:31 +05:00
  • d6a04f872d ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408) b3737 Georgi Gerganov 2024-09-12 14:23:49 +03:00
  • c9c8575a1a enhance run script to be easy to change the parameters (#9448) b3736 Neo Zhang Jianyu 2024-09-12 17:44:17 +08:00
  • df4b7945ae cann: Fix error when running a non-exist op (#9424) b3735 Xinpeng Dou 2024-09-12 09:02:35 +08:00
  • 449ccfb6f5 Add Jais to list of supported models (#9439) Faisal Zaghloul 2024-09-11 20:29:53 -04:00
  • d7c042d1ae ggml : make n_threads_cur atomic_int gg/ggml-atomic-int Georgi Gerganov 2024-09-11 21:12:11 +03:00
  • 1b28061400 llama : skip token bounds check when evaluating embeddings (#9437) b3733 slaren 2024-09-11 17:52:13 +02:00
  • 8db003a19d py : support converting local models (#7547) Pavel Zloi 2024-09-11 15:29:51 +03:00
  • 0996c5597f llava : correct args for minicpmv-cli (#9429) b3731 Xuan Son Nguyen 2024-09-11 12:59:13 +02:00
  • f9968f661d ggml : update comments [no ci] gg/ggml-rework-cgraph Georgi Gerganov 2024-09-11 13:16:39 +03:00
  • 119e0bc9ae ggml : remove ggml_cplan + rework ggml_cgraph Georgi Gerganov 2024-09-11 13:05:10 +03:00
  • ee154457dd ggml : fix compiler warnings Georgi Gerganov 2024-09-11 13:03:18 +03:00
  • 5bb2c5dbd2 files : remove accidentally added lora_test submodule (#9430) Xuan Son Nguyen 2024-09-11 12:02:09 +02:00
  • 67155ab7f5 feat: Implements retrying logic for downloading models using --model-url flag (#9255) b3729 Farbod Bijary 2024-09-11 12:52:37 +03:30
  • 5af118efda CUDA: fix --split-mode row race condition (#9413) b3728 Johannes Gäßler 2024-09-11 10:22:40 +02:00
  • 92a96865cd ggml : add ggml-impl.h to backends Georgi Gerganov 2024-09-11 10:07:21 +03:00
  • d2b496bff4 batched-bench : remove unused code (#9305) b3727 Georgi Gerganov 2024-09-11 10:03:54 +03:00
  • b34e023480 musa: remove Clang builtins mapping (#9421) b3726 R0CKSTAR 2024-09-11 09:46:55 +08:00
  • 51b6038636 sycl : update support conditions (#9394) b3725 Alberto Cabrera Pérez 2024-09-11 01:53:42 +01:00
  • cb9c933eb2 flake.lock: Update (#9360) Georgi Gerganov 2024-09-11 01:46:59 +03:00
  • 6cd4e03444 arg : bring back missing ifdef (#9411) b3723 Xuan Son Nguyen 2024-09-10 22:41:29 +02:00
  • 8d300bd35f enable --special arg for llama-server (#9419) b3722 matteo 2024-09-10 22:40:59 +02:00
  • 2d79a7077c quantize : use unused imatrix chunk_size with LLAMA_TRACE Francis Couture-Harpin 2024-09-10 12:09:17 -04:00
  • 49006c67b4 llama : move random seed generation to the samplers (#9398) b3721 slaren 2024-09-10 18:04:25 +02:00
  • 8c13e16bb0 imatrix : allow loading mis-ordered tensors Francis Couture-Harpin 2024-09-10 11:31:49 -04:00
  • c8a3f291fe ggml : hide ggml_object, ggml_cgraph, ggml_hash_set Georgi Gerganov 2024-09-10 16:38:06 +03:00
  • 00ba2ff781 metal : fix compile warning with GGML_METAL_NDEBUG (#0) b3720 Georgi Gerganov 2024-09-10 10:17:03 +03:00
  • 83008b7cfe llama : update llm_build_copy_mask_state comment [no ci] (#9385) Daniel Bevenius 2024-09-10 09:03:21 +02:00
  • 0b4ac75772 RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387) b3718 Molly Sophia 2024-09-10 15:02:30 +08:00
  • fb3f249815 make : do not run llama-gen-docs when building (#9399) b3717 slaren 2024-09-10 08:23:33 +02:00
  • 2217247051 imatrix : remove unused n_entries Francis Couture-Harpin 2024-09-09 22:35:47 -04:00
  • efa9186dc8 imatrix : avoid using designated initializers in C++ Francis Couture-Harpin 2024-09-09 22:33:10 -04:00
  • 894ed8d7b6 py : include imatrix converter requirements in toplevel requirements Francis Couture-Harpin 2024-09-09 22:20:18 -04:00
  • 9e6b0e9419 perplexity : revert changes Francis Couture-Harpin 2024-09-09 22:00:37 -04:00
  • 503630e88a py : add requirements for legacy imatrix convert script Francis Couture-Harpin 2024-09-09 21:56:04 -04:00
  • bfe76d4a17 common : move arg parser code to arg.cpp (#9388) b3716 Xuan Son Nguyen 2024-09-09 23:36:09 +02:00
  • 293bebe077 rpc : fix segfault with nkvo (#9389) b3715 Radoslav Gerganov 2024-09-09 18:40:10 +03:00
  • 5fac4d5764 ggml : vector length agnostic SVE support (#9290) b3714 Prashant Vithule 2024-09-09 21:07:18 +05:30
  • 5fb5e24811 llama : minor sampling refactor (2) (#9386) b3713 slaren 2024-09-09 17:10:46 +02:00
  • 38ca6f644b readme : update hot topics Georgi Gerganov 2024-09-09 15:51:37 +03:00
  • 8e6e2fbe14 CUDA: fix variable name conflict for Windows build (#9382) b3711 Johannes Gäßler 2024-09-09 14:22:53 +02:00
  • 5ed087573e readme : add LLMUnity to UI projects (#9381) Antonis Makropoulos 2024-09-09 14:21:38 +03:00
  • cfbf33a705 ggml : style changes + fix 512-bit nb loop check SVE-vector-length-agnostic-VLA-gg Georgi Gerganov 2024-09-09 12:50:35 +03:00
  • 54f376d0b9 rpc : update README [no ci] (#9320) Radoslav Gerganov 2024-09-09 11:04:39 +03:00
  • b2e89a3274 Arm AArch64: Documentation updates (#9321) Dan Johansson 2024-09-09 09:02:45 +02:00
  • daa9623ab0 Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118) b3707 Markus Tavenrath 2024-09-08 21:43:48 +02:00
  • e079bffb66 cuda : fix FA Q src index (1 -> 0) (#9374) b3706 Georgi Gerganov 2024-09-08 22:01:02 +03:00
  • 3f7ccfd649 common : bring back missing args, add env var duplication check (#9375) b3705 Xuan Son Nguyen 2024-09-08 18:08:55 +02:00
  • d19101c9a0 imatrix : use FMA and sort tensor names Francis Couture-Harpin 2024-09-08 11:03:59 -04:00
  • a249843d89 common : restore --n-gpu-layers (#9371) b3704 slaren 2024-09-08 16:44:42 +02:00
  • 3ad0603c65 Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2024-09-08 10:05:08 -04:00
  • c8ab6a3ba3 imatrix : fix conversion problems Francis Couture-Harpin 2024-09-08 10:04:01 -04:00