3767 Commits

Author SHA1 Message Date
Eve 5c3d0f1824 ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)
* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before
b3767
2024-09-16 09:48:24 +03:00
Shane A 0aadac10c7 llama : support OLMoE (#9462) b3766 2024-09-16 09:47:37 +03:00
CarryFun 95ca85168b llama : support MiniCPM3 (#9322)
Co-authored-by: 范睿凯 <fanruikai@modelbest.cn>
b3765
2024-09-16 09:45:20 +03:00
Vinesh Janarthanan 441b72b91f main : option to disable context shift (#9484)
* added cli arg to disable context shift

* reverted precommit

* updated README.md for main

* white space

* allow disabling context shift in the server

* Update common/arg.cpp

no-context-shift only works for main example

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* added server example to --no-context-shift args

* removed server changes

* white space

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3764
2024-09-16 09:20:01 +03:00
Georgi Gerganov c4965a64f7 metal : handle zero-sized allocs (#9466) b3763 2024-09-16 09:05:56 +03:00
Georgi Gerganov 90a2fff0e7 flake.lock: Update (#9488) 2024-09-15 19:14:23 -07:00
Georgi Gerganov 6262d13e0b common : reimplement logging (#9418)
https://github.com/ggerganov/llama.cpp/pull/9418
b3761
2024-09-15 20:46:12 +03:00
slaren e6deac31f7 gguf-split : add basic checks (#9499)
* gguf-split : do not overwrite existing files when merging

* gguf-split : error when too many arguments are passed
b3760
2024-09-15 19:02:27 +02:00
Michael Podvitskiy 6988da94a2 cmake : correct order of sycl flags (#9497) b3759 2024-09-15 19:55:52 +03:00
Csaba Kecskemeti 3c7989fd29 py : add "LLaMAForCausalLM" conversion support (#9485)
Co-authored-by: Csaba Kecskemeti <csabakecskemeti@Csabas-Mac-Pro.local>
b3758
2024-09-15 10:48:25 +03:00
OSecret d6b37c881f readme : update tools list (#9475)
* Added link to proprietary wrapper for Unity3d into README.md

Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html) and [that](https://d23myu0xfn2ttc.cloudfront.net/).

* Update README.md

Fixes upon review
b3757
2024-09-15 10:36:53 +03:00
Michael Podvitskiy 7596487beb cmake : try to fix sycl+intel build (#9487) b3756 2024-09-15 10:06:38 +03:00
Yuri Khrustalev 822b6322de ggml : ggml_type_name return "NONE" for invalid values (#9458)
When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
b3755
2024-09-14 12:54:37 +03:00
VoidIsVoid dcdcee3a74 server: add data: [DONE] to /chat/completions stream response (#9459) b3754 2024-09-14 11:36:44 +02:00
Georgi Gerganov 1f4111e540 cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)
* cmake : use list(APPEND ...) instead of set() + dedup linker

ggml-ci

* cmake : try fix sycl

* cmake : try to fix sycl 2

* cmake : fix sycl build (#9469)

* try fix sycl build

* use CMAKE_CXX_FLAGS as a string variable

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* one more CMAKE_CXX_FLAGS fix (#9471)

---------

Co-authored-by: Michael Podvitskiy <podvitskiymichael@gmail.com>
b3753
2024-09-14 10:55:05 +03:00
Daniel Bevenius befaf1197f llama : make cell_id const in inp_s_mask block (#9470)
This commit makes the cell_id variable const in the inp_s_mask block.

The motivation for this change is consistency with the code in the
inp_s_copy block.
b3752
2024-09-14 10:50:12 +03:00
Xuan Son Nguyen feff4aa846 server : add loading html page while model is loading (#9468)
* Adding loading page for '/' server requests

* set content when model is loading

* removed loading html file

* updated cmakelist

* updated makefile

* cleaned up whitespace

* cleanup for PR removed error

* updated server test to handle 503 HTML

* updated server test to handle 503 HTML

* ca†ch 503 before parsing json

* revert test

* account for both api and web browser requests

* precommit corrections

* eol fix

* revert changes to pre-commit

* removed print statement

* made loading message more descriptive

* also support .html files

---------

Co-authored-by: VJHack <flymyplane21@gmail.com>
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>
b3751
2024-09-13 14:23:11 +02:00
Georgi Gerganov 0abc6a2c25 llama : llama_perf + option to disable timings during decode (#9355)
* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
b3750
2024-09-13 09:53:38 +03:00
Gilad S. bd35cb0ae3 feat: remove a sampler from a chain (#9445)
* feat: remove a sampler from a chain

* fix: return removed sampler

* fix: safer casting
b3749
2024-09-13 03:54:49 +02:00
Mathijs Henquet 78203641fe server : Add option to return token pieces in /tokenize endpoint (#9108)
* server : added with_pieces functionality to /tokenize endpoint

* server : Add tokenize with pieces tests to server.feature

* Handle case if tokenizer splits along utf8 continuation bytes

* Add example of token splitting

* Remove trailing ws

* Fix trailing ws

* Maybe fix ci

* maybe this fix windows ci?

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b3748
2024-09-12 22:30:11 +02:00
Dou Xinpeng e6b7801bd1 cann: Add host buffer type for Ascend NPU (#9406)
* feat: Add host buffer type for Ascend NPU(CANN backend)

* fix some checking errors

* Add a few comments
b3747
2024-09-12 19:46:43 +08:00
fengerhu1 e665744317 llava : fix the script error in MobileVLM README (#9054)
Signed-off-by: Erhu Feng <2748250768@qq.com>
b3746
2024-09-12 14:34:22 +03:00
Xuan Son Nguyen d4c3c10fad lora : raise error if lm_head is ignored (#9103)
* lora : raise error if lm_head is ignored

* fix style

* clarify comment
2024-09-12 14:33:57 +03:00
Michael Podvitskiy 2a825116b6 cmake : fix for builds without GGML_CDEF_PUBLIC (#9338)
* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC`

* Update CMakeLists.txt, spaces fix
b3744
2024-09-12 14:30:01 +03:00
Huang Qi 4dc4f5f14a ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) b3743 2024-09-12 14:28:43 +03:00
daminho c837981bba py : add Phi-1.5/Phi-2 tokenizer (#9361)
* add phi2 tokenizer

* add phi name to convert_hf_to_gguf_update.py

* make tokenizer_pre consistent; llama.cpp work
2024-09-12 14:28:20 +03:00
Trivikram Kamat 3c26a1644d ci : bump actions/checkout to v4 (#9377) 2024-09-12 14:27:45 +03:00
Michael Podvitskiy ff76e18516 cmake : fixed the order of linking libraries for llama-quantize (#9450) b3740 2024-09-12 14:27:14 +03:00
Molly Sophia 39f852f440 py : add special tokens in hf_converter for RWKV v6 (#9428)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-09-12 14:25:16 +03:00
Ahmad Tameem 2b00fa7997 riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)
- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V
b3738
2024-09-12 14:24:31 +03:00
Georgi Gerganov d6a04f872d ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)
* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set

ggml-ci

* ggml : add ggml-impl.h to backends

* ggml : fix compiler warnings

ggml-ci

* ggml : add assert upon adding nodes
b3737
2024-09-12 14:23:49 +03:00
Neo Zhang Jianyu c9c8575a1a enhance run script to be easy to change the parameters (#9448)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
b3736
2024-09-12 17:44:17 +08:00
Xinpeng Dou df4b7945ae cann: Fix error when running a non-exist op (#9424) b3735 2024-09-12 09:02:35 +08:00
Faisal Zaghloul 449ccfb6f5 Add Jais to list of supported models (#9439)
Co-authored-by: fmz <quic_fzaghlou@quic.com>
2024-09-12 02:29:53 +02:00
slaren 1b28061400 llama : skip token bounds check when evaluating embeddings (#9437) b3733 2024-09-11 17:52:13 +02:00
Pavel Zloi 8db003a19d py : support converting local models (#7547)
* Support of converting local models added to convert-hf-to-gguf-update.py

* Description fixed

* shutil added to imports
2024-09-11 15:29:51 +03:00
Xuan Son Nguyen 0996c5597f llava : correct args for minicpmv-cli (#9429) b3731 2024-09-11 12:59:13 +02:00
Xuan Son Nguyen 5bb2c5dbd2 files : remove accidentally added lora_test submodule (#9430) 2024-09-11 13:02:09 +03:00
Farbod Bijary 67155ab7f5 feat: Implements retrying logic for downloading models using --model-url flag (#9255)
* feat: Implements retrying logic for downloading models using --model-url flag

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* apply comments

* implements a retry function to avoid duplication

* fix editorconfig

* change function name

---------

Co-authored-by: farbod <farbod.bjary82@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b3729
2024-09-11 11:22:37 +02:00
Johannes Gäßler 5af118efda CUDA: fix --split-mode row race condition (#9413) b3728 2024-09-11 10:22:40 +02:00
Georgi Gerganov d2b496bff4 batched-bench : remove unused code (#9305) b3727 2024-09-11 10:03:54 +03:00
R0CKSTAR b34e023480 musa: remove Clang builtins mapping (#9421)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b3726
2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez 51b6038636 sycl : update support conditions (#9394)
* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
b3725
2024-09-11 08:53:42 +08:00
Georgi Gerganov cb9c933eb2 flake.lock: Update (#9360)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
  → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/a5d394176e64ab29c852d03346c1fc9b0b7d33eb.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01)
  → 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)
  → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-10 15:46:59 -07:00
Xuan Son Nguyen 6cd4e03444 arg : bring back missing ifdef (#9411)
* arg : bring back missing ifdef

* replace with llama_supports_gpu_offload
b3723
2024-09-10 22:41:29 +02:00
matteo 8d300bd35f enable --special arg for llama-server (#9419)
Co-authored-by: matteo serva <matteo.serva@gmail.com>
b3722
2024-09-10 22:40:59 +02:00
slaren 49006c67b4 llama : move random seed generation to the samplers (#9398)
* llama_sampler_penalties : clamp penalty_last_n to zero
b3721
2024-09-10 18:04:25 +02:00
Georgi Gerganov 00ba2ff781 metal : fix compile warning with GGML_METAL_NDEBUG (#0) b3720 2024-09-10 10:17:43 +03:00
Daniel Bevenius 83008b7cfe llama : update llm_build_copy_mask_state comment [no ci] (#9385)
This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.

I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv.
2024-09-10 10:03:21 +03:00
Molly Sophia 0b4ac75772 RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b3718
2024-09-10 10:02:30 +03:00