slaren
6e02327e8b
metal : fix uninitialized abort_callback ( #8968 )
b3565
2024-08-10 15:42:10 +02:00
Xuan Son Nguyen
7eb23840ed
llama : default n_swa for phi-3 ( #8931 )
...
* default n_swa for phi-3
* fix
* double check swa
b3564
2024-08-10 13:04:40 +02:00
fairydreaming
7c3f55c100
Add support for encoder-only T5 models ( #8900 )
...
* gguf-py : add T5ENCODER model architecture
* common : call llama_decode() during warmup only if the model has decoder
* convert-hf : add T5EncoderModel
* llama : add llama_model_has_decoder() API function
* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()
* llama : add support for LLM_ARCH_T5ENCODER
* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE
* llama-embedding : add support for encoder-only models
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
b3563
2024-08-10 11:43:26 +02:00
Matteo Mortari
911b437f22
gguf-py : fix double call to add_architecture() ( #8952 )
...
Signed-off-by: tarilabs <matteo.mortari@gmail.com >
2024-08-10 08:58:49 +03:00
Georgi Gerganov
b72942fac9
Merge commit from fork
b3561
2024-08-09 23:03:21 +03:00
fairydreaming
6afd1a99dc
llama : add support for lora adapters in T5 model ( #8938 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
b3560
2024-08-09 18:53:09 +02:00
Georgi Gerganov
272e3bd95e
make : fix llava obj file race ( #8946 )
...
ggml-ci
b3559
2024-08-09 18:24:30 +03:00
Georgi Gerganov
45a55b91aa
llama : better replace_all (cont) ( #8926 )
...
* llama : better replace_all (cont)
ggml-ci
* code : deduplicate replace_all
ggml-ci
2024-08-09 18:23:52 +03:00
tc-mb
3071c0a5f2
llava : support MiniCPM-V-2.5 ( #7599 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* clip : style changes
* del common.h in clip
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix makefile error
* fix ubuntu-make error
* try fix clip
* try fix 1
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com >
Co-authored-by: harvestingmoon <leewenyeong@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b3557
2024-08-09 13:33:53 +03:00
Georgi Gerganov
4305b57c80
sync : ggml
b3556
2024-08-09 10:03:48 +03:00
Matt Stephenson
70c0ea3560
whisper : use vulkan as gpu backend when available (whisper/2302)
...
* ggml: use vulkan as gpu backend when available
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com >
* whisper: enable using vk as default buffer type
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com >
---------
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com >
2024-08-09 10:03:44 +03:00
Daniel Bevenius
5b2c04f492
embedding : add --pooling option to README.md [no ci] ( #8934 )
...
This commit adds the `--pooling` option to the README.md file in the
`examples/embedding` directory.
The motivation for adding this options is that currently if the model
used does not specify a pooling type the embedding example will fail
with the following error message:
```console
main: error: pooling type NONE not supported
```
This commit also updates the name of the executable in the examples
section.
2024-08-09 09:33:30 +03:00
Daniel Bevenius
6f6496bb09
llama : fix typo in llama_tensor_get_type comment [no ci] ( #8937 )
2024-08-09 09:32:23 +03:00
Mathieu Geli
daef3ab233
server : add one level list nesting for embeddings ( #8936 )
2024-08-09 09:32:02 +03:00
compilade
345a686d82
llama : reduce useless copies when saving session ( #8916 )
...
* llama : avoid useless copies in dummy session writer
* llama : avoid double tensor copy when saving session to buffer
b3551
2024-08-08 23:54:00 -04:00
compilade
3a14e00366
gguf-py : simplify support for quant types ( #8838 )
...
* gguf-py : use classes for quants
* convert_hf : simplify internal quantization type selection
* gguf-py : fix flake8 lint
* gguf-py : fix BF16 numpy view type
* gguf-py : remove LlamaFileTypeMap
Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.
* gguf-py : add generic quantize and dequantize functions
The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.
2024-08-08 13:33:09 -04:00
Georgi Gerganov
afd27f01fe
scripts : sync cann files ( #0 )
2024-08-08 14:56:52 +03:00
Georgi Gerganov
366d486c16
scripts : fix sync filenames ( #0 )
2024-08-08 14:40:12 +03:00
Georgi Gerganov
e44a561ab0
sync : ggml
b3547
2024-08-08 13:19:47 +03:00
Borislav Stanimirov
f93d49ab1e
ggml : ignore more msvc warnings (ggml/906)
2024-08-08 13:19:31 +03:00
Georgi Gerganov
5b33ea1ee7
metal : fix struct name (ggml/912)
...
ggml-ci
2024-08-08 13:19:31 +03:00
Conrad Kramer
85fca8deb6
metal : add abort callback (ggml/905)
2024-08-08 13:19:30 +03:00
Pablo Duboue
ebd541a570
make : clean llamafile objects ( #8923 )
...
`ggml/src/llamafile/sgemm.o` was not deleted on `make clean`
b3543
2024-08-08 11:44:51 +03:00
slaren
15fa07a5c5
make : use C compiler to build metal embed object ( #8899 )
...
* make : use C compiler to build metal embed object
* use rm + rmdir to avoid -r flag in rm
b3542
2024-08-07 18:24:05 +02:00
slaren
be55695eff
ggml-backend : fix async copy from CPU ( #8897 )
...
* ggml-backend : fix async copy from CPU
* cuda : more reliable async copy, fix stream used when the devices are the same
b3541
2024-08-07 13:29:02 +02:00
Ouadie EL FAROUKI
0478174d59
[SYCL] Updated SYCL device filtering ( #8901 )
...
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
b3540
2024-08-07 11:25:36 +01:00
Johannes Gäßler
a8dbc6f753
CUDA/HIP: fix tests/test-backend-ops ( #8896 )
b3539
2024-08-07 09:07:52 +02:00
Zhenwei Jin
506122d854
llama-bench : add support for getting cpu info on Windows ( #8824 )
...
* Add support for getting cpu info on Windows for llama_bench
* refactor
---------
Co-authored-by: slaren <slarengh@gmail.com >
b3538
2024-08-07 03:01:06 +02:00
Daniel Bevenius
725e3d9437
quantize : update usage comment in quantize.cpp ( #8889 )
...
This commit updates the usage comment in quantize.cpp to reflect the
new name of the executable, which is llama-quantize.
b3537
2024-08-07 01:43:00 +02:00
Nexes the Old
31958546c3
typo correction ( #8891 )
b3536
2024-08-07 01:41:54 +02:00
Xuan Son Nguyen
1e6f6554aa
server : add lora hotswap endpoint (WIP) ( #8857 )
...
* server : add lora hotswap endpoint
* handle lora_no_apply
* fix build
* updae docs
* clean up struct def
* fix build
* add LoRA test
* fix style
b3535
2024-08-06 17:33:39 +02:00
Johannes Gäßler
641f5dd2a6
CUDA: fix padding logic for FP16/FP32 ( #8884 )
b3534
2024-08-06 17:13:55 +02:00
Daniel Bevenius
5f4dcb1e60
simple : update name of executable to llama-simple ( #8885 )
...
This commit updates the name of the executable in README.md from
`simple` to `llama-simple`.
2024-08-06 16:44:35 +02:00
Jaeden Amero
db20f50cf4
cmake : Link vulkan-shaders-gen with pthreads ( #8835 )
...
When using CMake to build with Vulkan support, compiling
vulkan-shaders-gen fails due to missing a CMakeLists.txt specification
to link vulkan-shaders-gen with the threading library, resulting in the
following error.
[5/172] Linking CXX executable bin/vulkan-shaders-gen
FAILED: bin/vulkan-shaders-gen
: && /usr/bin/c++ ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o bin/vulkan-shaders-gen && :
ld: error: undefined symbol: pthread_create
>>> referenced by vulkan-shaders-gen.cpp
>>> ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o:(std::__1::__libcpp_thread_create[abi:se180100](pthread**,
>>> void* (*)(void*), void*))
c++: error: linker command failed with exit code 1 (use -v to see invocation)
[6/172] Generating build details from Git
-- Found Git: /usr/local/bin/git (found version "2.45.2")
ninja: build stopped: subcommand failed.
Add the CMakeLists.txt specification to link vulkan-shaders-gen with the
threading library and fix the above error.
Fixes #8834
b3532
2024-08-06 15:21:47 +02:00
MaggotHATE
efda90c93a
[Vulkan] Fix compilation of vulkan-shaders-gen on w64devkit after e31a4f6 ( #8880 )
...
* Fix compilation issue in `vulkan-shaders-gen`
https://github.com/ggerganov/llama.cpp/commit/e31a4f679779220312c165b0f5994c680a610e38 broke compilation on w64devkit. Including `algorithm` seems to fix that.
* Guard it under `#ifdef _WIN32`
b3531
2024-08-06 13:32:03 +02:00
Georgi Gerganov
0bf16de07b
contributing : add note about write access
2024-08-06 11:48:01 +03:00
Molly Sophia
2d5dd7bb3f
ggml : add epsilon as a parameter for group_norm ( #8818 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b3529
2024-08-06 10:26:46 +03:00
Douglas Hanley
cdd1889de6
convert : add support for XLMRoberta embedding models ( #8658 )
...
* add conversion for bge-m3; small fix in unigram tokenizer
* clean up and simplify XLMRoberta conversion
b3528
2024-08-06 10:20:54 +03:00
Mengqing Cao
c21a896405
[CANN]: Fix ggml_backend_cann_buffer_get_tensor ( #8871 )
...
* cann: fix ggml_backend_cann_buffer_get_tensor
1. fix data ptr offset
2. enable the acquisition of incomplete tensors
* fix backend cann set_tensor
b3527
2024-08-06 12:42:42 +08:00
Neo Zhang
d4ff847153
[SYCL] correct cmd name ( #8877 )
2024-08-06 09:09:12 +08:00
Liu Jia
0a4ce78681
common : Changed tuple to struct (TODO fix) ( #8823 )
...
* common : Changed tuple to struct (TODO fix)
Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>
* delete llama_init_default_params()
* delete the extra whitespace
b3525
2024-08-05 18:14:10 +02:00
wangshuai09
bc0f887e15
cann: fix buffer_num and runtime speed slowly error ( #8865 )
b3524
2024-08-05 21:10:37 +08:00
Eric Curtin
b42978e7e4
readme : add ramalama to the availables UI ( #8811 )
...
ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
2024-08-05 15:45:01 +03:00
Justine Tunney
b9dfc25ca3
ggml : fix overflows in elu function ( #8866 )
...
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.
b3522
2024-08-05 15:43:40 +03:00
Brian
1ef14b3007
py: Add more authorship metadata from model card ( #8810 )
...
* py: add more authorship metadata from model card
* fixup! py: add more authorship metadata from model card
2024-08-05 21:15:28 +10:00
fairydreaming
d3f0c7166a
Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support ( #8858 )
...
* gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token
* llama : find Llama-3.1 <|eom_id|> token id during vocab loading
* llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
b3520
2024-08-05 09:38:01 +02:00
stduhpf
e31a4f6797
cmake: fix paths for vulkan shaders compilation on Windows ( #8573 )
...
* Vulkan-shaders: attempt fix compilation on windows
* fix miss-matched parenthesis
b3519
2024-08-05 08:18:27 +02:00
BarfingLemurs
400ae6f65f
readme : update model list ( #8851 )
b3518
2024-08-05 08:54:10 +03:00
Georgi Gerganov
f1ea5146d7
llama : better replace_all ( #8852 )
b3517
2024-08-05 08:53:39 +03:00
0cc4m
064cdc265f
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 ( #8855 )
...
* Fix Vulkan mul mat vec invalid results when ncols < warp size
* Only run backend ops mul mat vec block size test if block size not already covered
b3516
2024-08-05 08:52:55 +03:00