llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-29 15:50:22 +00:00

Files

T

JJJYmmm fc0fe40049 models : support qwen3.5 series (#19468 )

* support qwen3.5 series

* remove deepstack for now, and some code clean

* code clean

* add FULL_ATTENTION_INTERVAL metadata

* code clean

* reorder v heads for linear attention to avoid expensive interleaved repeat

2026-02-10 18:00:26 +02:00

batched-bench

tool/ex/tests: consistently free ctx, then model (#18168 )

2025-12-22 11:00:37 +01:00

cli

common : use two decimal places for float arg help messages (#19048 )

2026-01-25 07:31:42 +01:00

completion

completion : simplify batch (embd) processing (#19286 )

2026-02-04 05:43:28 +01:00

cvector-generator

docs : Minor cleanups (#19252 )

2026-02-02 08:38:55 +02:00

export-lora

docs : Minor cleanups (#19252 )

2026-02-02 08:38:55 +02:00

fit-params

llama-fit-params: keep explicit --ctx-size 0 (#19070 )

2026-01-24 22:13:08 +01:00

gguf-split

cli: new CLI experience (#17824 )

2025-12-10 15:28:59 +01:00

imatrix

common : refactor common_sampler + grammar logic changes (#17937 )

2025-12-14 10:11:13 +02:00

llama-bench

Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )

2026-01-16 09:46:51 +01:00

mtmd

models : support qwen3.5 series (#19468 )

2026-02-10 18:00:26 +02:00

perplexity

docs : Minor cleanups (#19252 )

2026-02-02 08:38:55 +02:00

quantize

llama-quantize : cleanup --help output (#19317 )

2026-02-08 09:22:38 +02:00

rpc

rpc : update from common.cpp (#19400 )

2026-02-08 09:06:45 +01:00

server

Server: log when converting requests to chat completions format (#19457 )

2026-02-09 16:22:57 +01:00

tokenize

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

tts

tts : fix typos in README.md [no ci] (#19463 )

2026-02-10 07:30:41 +01:00

CMakeLists.txt

cmake: only build cli when server is enabled (#18670 )

2026-01-09 16:43:26 +01:00