llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 07:10:21 +00:00

Files

T

Aarnav Pai d73cd07674 graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357 )

* llama-graph : apply embedding scale when deepstack is not used

* nits: remove non-existant hunyuan-vl from the tests

* apply suggestion from @gabe-l-hart

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

2026-06-09 19:46:27 +02:00

models

models : fix plamo2 attention_key/value_length regression (#24317 )

2026-06-09 10:26:44 +03:00

CMakeLists.txt

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-adapter.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-adapter.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama-arch.cpp

mtp: support for gemma-4 E2B and E4B assistants (#24282 )

2026-06-08 13:48:52 -07:00

llama-arch.h

mtp: support for gemma-4 E2B and E4B assistants (#24282 )

2026-06-08 13:48:52 -07:00

llama-batch.cpp

kv-cache : fix M-RoPE checkpoints (#20132 )

2026-03-06 08:46:51 +02:00

llama-batch.h

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-chat.cpp

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-chat.h

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-context.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-context.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-cparams.cpp

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )

2025-06-15 10:08:58 +03:00

llama-cparams.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-ext.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-grammar.cpp

common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )

2026-03-21 18:43:35 +01:00

llama-grammar.h

common/grammar : replace problematic backtracking regex [\s\S]* (#18342 )

2026-01-03 16:02:43 -06:00

llama-graph.cpp

graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357 )

2026-06-09 19:46:27 +02:00

llama-graph.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-hparams.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-hparams.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-impl.cpp

llama : correct platform-independent loading of BOOL metadata (#21428 )

2026-04-06 01:40:38 +02:00

llama-impl.h

llama: use f16 mask for FA to save VRAM (#23764 )

2026-05-29 15:44:43 +08:00

llama-io.cpp

server : avoid checkpoint data host copies (#22558 )

2026-05-02 18:03:25 +03:00

llama-io.h

llama : add option to save memory in device buffers (#22679 )

2026-05-05 06:35:07 +03:00

llama-kv-cache-dsa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache-dsa.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache-iswa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache-iswa.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache.cpp

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-kv-cache.h

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-kv-cells.h

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-memory-hybrid-iswa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-memory-hybrid-iswa.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-hybrid.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-memory-hybrid.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-recurrent.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-memory-recurrent.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory.cpp

memory : correctly handle failure in apply() (#14438 )

2025-06-30 18:03:03 +03:00

llama-memory.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-mmap.cpp

Update llama-mmap to use ftello/fseeko (#22497 )

2026-04-30 14:17:52 -07:00

llama-mmap.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model-loader.cpp

model, mtmd: Granite4 Vision (#23545 )

2026-06-05 17:44:59 +02:00

llama-model-loader.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-saver.cpp

model, mtmd: Granite4 Vision (#23545 )

2026-06-05 17:44:59 +02:00

llama-model-saver.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-model.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-quant.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-quant.h

llama : refactor src/llama.cpp (#10902 )

2025-01-03 10:18:53 +02:00

llama-sampler.cpp

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-sampler.h

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-vocab.cpp

mtmd, model: allow skip build_vit() (#24077 )

2026-06-03 17:10:35 +02:00

llama-vocab.h

mtmd, model: allow skip build_vit() (#24077 )

2026-06-03 17:10:35 +02:00

llama.cpp

llama: only use one iGPU device by default (#23897 )

2026-05-31 08:17:47 +02:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

unicode.h

vocab: fix Gemma4 tokenizer (#21343 )

2026-04-03 10:33:03 +02:00