bandoti
c46503014d
cmake: remove shader-gen step-targets from ggml-vulkan ( #14226 )
...
* Remove step-targets from vulkan-shaders-gen
* Unset DESTDIR when building vulkan-shaders-gen
b5689
2025-06-17 22:33:25 +02:00
xctan
860a9e4eef
ggml-cpu : remove the weak alias trick ( #14221 )
b5688
2025-06-17 12:58:32 +03:00
R0CKSTAR
fe9d60e74a
musa: fix build warning (unused variable) ( #14231 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b5687
2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret
e434e69183
common : suggest --jinja when autodetection fails ( #14222 )
b5686
2025-06-16 21:58:42 +02:00
Georgi Gerganov
89fea80d29
server : fix incorrect usage of llama_get_embeddings() ( #14225 )
...
* server : fix incorrect usage of llama_get_embeddings()
ggml-ci
* cont : fix the fix
ggml-ci
b5685
2025-06-16 22:33:27 +03:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b5684
2025-06-16 08:11:43 -07:00
bandoti
0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen ( #14179 )
...
* Remove install step for vulkan-shaders-gen
* Add install step to normalize msvc with make
* Regenerate modified shaders at build-time
b5683
2025-06-16 10:32:13 -03:00
Đinh Trọng Huy
ad590be98c
model : add NeoBERT ( #14164 )
...
* convert neobert model to gguf
* add inference graph
* fix flake8 lint
* followed reviewer suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* follow reviewers suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* override NeoBERT feed-forward length
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b5682
2025-06-16 14:53:41 +02:00
uvos
7d6d91babf
HIP: disable rocwmma on gfx12 by default until rocm 7.0 ( #14202 )
b5681
2025-06-16 13:47:38 +02:00
Georgi Gerganov
d3e64b9f49
llama : rework embeddings logic ( #14208 )
...
* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci
2025-06-16 14:14:00 +03:00
Charles Xu
3ba0d843c6
ggml: Add Android support for GGML_CPU_ALL_VARIANTS ( #14206 )
b5679
2025-06-16 11:47:57 +02:00
Bartowski
0bf49eb668
convert : remove arcee change in convert_hf_to_gguf_update.py ( #14207 )
2025-06-16 10:16:06 +02:00
Đinh Trọng Huy
4ad243677b
gguf-py : allow key override when adding value to GGUFWriter ( #14194 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
2025-06-16 09:20:59 +02:00
Jeff Bolz
c89c2d1ab9
vulkan: mutex around vkQueueSubmit ( #14127 )
...
This fixes the remaining crash in test-thread-safety on my system.
b5676
2025-06-16 08:21:08 +02:00
xctan
3555b3004b
ggml-cpu : rework weak alias on apple targets ( #14146 )
...
* ggml-cpu : rework weak alias on apple targets
* fix powerpc detection
* fix ppc detection
* fix powerpc detection on darwin
b5675
2025-06-16 13:54:15 +08:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model ( #14185 )
...
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
b5674
2025-06-16 01:04:06 +02:00
Eric Curtin
cd355eda7d
server : When listening on a unix domain socket don't print http:// and port ( #14180 )
...
Instead show something like this:
main: server is listening on file.sock - starting the main loop
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b5673
2025-06-15 23:36:22 +02:00
Ed Addario
30e5b01de2
quantize : change int to unsigned int for KV overrides ( #14197 )
b5672
2025-06-15 18:53:45 +02:00
uvos
e54b394082
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 ( #14196 )
b5671
2025-06-15 17:30:13 +02:00
uvos
2c2caa4443
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ ( #14183 )
b5670
2025-06-15 15:45:27 +02:00
Georgi Gerganov
5fce5f948d
kv-cache : fix use-after-move of defrag info ( #14189 )
...
ggml-ci
b5669
2025-06-15 10:52:11 +03:00
Mikko Juola
9ae4143bc6
model : add dots.llm1 architecture support ( #14044 ) ( #14118 )
...
Adds:
* Dots1Model to convert_hf_to_gguf.py
* Computation graph code to llama-model.cpp
* Chat template to llama-chat.cpp to detect this model's template.
---
The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.
The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:
* https://huggingface.co/rednote-hilab/dots.llm1.inst
* https://huggingface.co/rednote-hilab/dots.llm1.base
The model architecture is a combination of Qwen and Deepseek parts, as
seen here:
https://github.com/huggingface/transformers/blob/ffe12627b4e84489d2ab91dd0ec00614855edc79/src/transformers/models/dots1/modular_dots1.py
b5668
2025-06-15 09:52:06 +02:00
Georgi Gerganov
c311ac664d
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ ( #14188 )
...
ggml-ci
b5667
2025-06-15 10:08:58 +03:00
Georgi Gerganov
b9912ac570
batch : auto-gen positions + verify multi-sequence input ( #14177 )
...
* batch : verify multi-sequence input batches
ggml-ci
* cont : auto-gen positions + verify multi-seq input
ggml-ci
* cont : first print debug info, then perform validation
ggml-ci
* cont : fix position auto-gen + add comments
ggml-ci
b5666
2025-06-15 09:18:37 +03:00
Pepijn de Vos
00ba772610
docs : remove WIP since PR has been merged ( #13912 )
2025-06-15 08:06:37 +02:00
Piotr
3cb203c89f
llama-chat : Do not throw when tool parsing fails ( #14012 )
...
Currently when a model generates output which looks like a tool call,
but is invalid an exception is thrown and not handled, causing the cli
or llama-server to bail. Instead, handle the chat parser exception and
simply return the generated text in such cases.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
b5664
2025-06-14 17:25:15 +01:00
Aman Gupta
2e42be42bd
compare-llama-bench: add option to plot ( #14169 )
...
* compare llama-bench: add option to plot
* Address review comments: convert case + add type hints
* Add matplotlib to requirements
* fix tests
* Improve comment and fix assert condition for test
* Add back default test_name, add --plot_log_scale
* use log_scale regardless of x_values
2025-06-14 10:34:20 +02:00
Georgi Gerganov
fb85a288d7
vocab : fix build ( #14175 )
...
ggml-ci
b5662
2025-06-13 20:03:05 +03:00
Svetlozar Georgiev
40643edb86
sycl: fix docker image ( #14144 )
2025-06-13 18:32:56 +02:00
Guy Goldenberg
3cfbbdb44e
Merge commit from fork
...
* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-13 19:20:25 +03:00
Georgi Gerganov
80709b70a2
batch : add LLAMA_BATCH_DEBUG environment variable ( #14172 )
...
* batch : add LLAMA_BATCH_DEBUG environment variable
ggml-ci
* cont : improve seq_id display
b5659
2025-06-13 18:35:00 +03:00
ddpasa
26ff3685bf
docs : Update multimodal.md ( #14122 )
...
* Update multimodal.md
* Update multimodal.md
2025-06-13 15:17:53 +02:00
Georgi Gerganov
60c666347b
batch : rework llama_batch_allocr ( #14153 )
...
* batch : rework llama_batch_allocr
ggml-ci
* cont : move validation inside class
ggml-ci
* cont : move output counting to class
ggml-ci
* cont : minor
ggml-ci
* batch : add TODOs
ggml-ci
b5657
2025-06-13 13:47:55 +03:00
Georgi Gerganov
b7cc7745e3
readme : remove survey link ( #14168 )
2025-06-13 11:55:44 +03:00
Christian Kastner
cc8d081879
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT ( #14167 )
...
* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
* cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*
b5655
2025-06-13 10:38:52 +02:00
Đinh Trọng Huy
d714dadb57
pooling : make cls_b and cls_out_b optional ( #14165 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp >
b5654
2025-06-13 11:34:08 +03:00
Georgi Gerganov
ffad043973
server : fix SWA condition for full context reprocess ( #14163 )
...
ggml-ci
b5653
2025-06-13 11:18:25 +03:00
Anton Mitkov
0889eba570
sycl: Adding additional cpy dbg print output ( #14034 )
b5652
2025-06-13 08:51:39 +01:00
Ewan Crawford
c61285e739
SYCL: Bump oneMath commit ( #14152 )
...
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.
With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.
```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator()
Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154
Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
b5651
2025-06-13 08:45:37 +01:00
Christian Kastner
09cf2c7c65
cmake : Improve build-info.cpp generation ( #14156 )
...
* cmake: Simplify build-info.cpp generation
The rebuild of build-info.cpp still gets triggered when .git/index gets
changes.
* cmake: generate build-info.cpp in build dir
b5650
2025-06-13 09:51:34 +03:00
Georgi Gerganov
c33fe8b8c4
vocab : prevent heap overflow when vocab is too small ( #14145 )
...
ggml-ci
b5649
2025-06-13 08:03:54 +03:00
Anton Mitkov
ed52f3668e
sycl: Remove not needed copy f16->f32 for dnnl mul mat ( #14125 )
b5648
2025-06-12 15:15:11 +02:00
Georgi Gerganov
a681b4ba83
readme : remove project status link ( #14149 )
2025-06-12 14:43:09 +03:00
Georgi Gerganov
7d516443dd
server : re-enable SWA speculative decoding ( #14131 )
...
ggml-ci
b5646
2025-06-12 11:51:38 +03:00
Georgi Gerganov
f6e1a7aa87
context : simplify output counting logic during decode ( #14142 )
...
* batch : remove logits_all flag
ggml-ci
* context : simplify output counting logic during decode
ggml-ci
* cont : fix comments
b5645
2025-06-12 11:50:01 +03:00
Georgi Gerganov
c3ee46fab4
batch : remove logits_all flag ( #14141 )
...
ggml-ci
b5644
2025-06-12 11:49:26 +03:00
Georgi Gerganov
e2c0b6e46a
cmake : handle whitepsaces in path during metal build ( #14126 )
...
* cmake : handle whitepsaces in path during metal build
ggml-ci
* cont : proper fix
ggml-ci
---------
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com >
2025-06-12 10:14:24 +03:00
Georgi Gerganov
9596506965
kv-cache : fix split_equal handling in unified implementation ( #14130 )
...
ggml-ci
b5642
2025-06-12 10:02:15 +03:00
compilade
a20b2b05bc
context : round n_tokens to next multiple of n_seqs when reserving ( #14140 )
...
This fixes RWKV inference which otherwise failed
when the worst case ubatch.n_seq_tokens rounded to 0.
b5641
2025-06-12 02:56:04 -04:00
bandoti
2e89f76b7a
common: fix issue with regex_escape routine on windows ( #14133 )
b5640
2025-06-11 17:19:44 -03:00