llama.cpp/src at 4c4e91b799c206fddaa56d89f0b4e61f6a263a4e - llama.cpp - Gitea: Git with a cup of tea

kanshan/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 07:10:21 +00:00

Files

T

History

Radoslav Gerganov 1738129bee llama : do not skip iGPU when only RPC devices are present (#23868 )

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made `model->devices` non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.

Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer
suppress the local iGPU.

closes: #23858

2026-05-30 07:48:22 +03:00

..

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

CMakeLists.txt

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-adapter.cpp

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-adapter.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama-arch.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-arch.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-batch.cpp

kv-cache : fix M-RoPE checkpoints (#20132 )

2026-03-06 08:46:51 +02:00

llama-batch.h

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-chat.cpp

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-chat.h

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-context.cpp

Move to backend sampling for MTP draft path (#23287 )

2026-05-20 22:34:45 +05:30

llama-context.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-cparams.cpp

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )

2025-06-15 10:08:58 +03:00

llama-cparams.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-ext.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-grammar.cpp

common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )

2026-03-21 18:43:35 +01:00

llama-grammar.h

common/grammar : replace problematic backtracking regex [\s\S]* (#18342 )

2026-01-03 16:02:43 -06:00

llama-graph.cpp

graph : ensure DS32 kq_mask_lid is F32 (#23864 )

2026-05-29 19:55:14 +02:00

llama-graph.h

graph : ensure DS32 kq_mask_lid is F32 (#23864 )

2026-05-29 19:55:14 +02:00

llama-hparams.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-hparams.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-impl.cpp

llama : correct platform-independent loading of BOOL metadata (#21428 )

2026-04-06 01:40:38 +02:00

llama-impl.h

llama: use f16 mask for FA to save VRAM (#23764 )

2026-05-29 15:44:43 +08:00

llama-io.cpp

server : avoid checkpoint data host copies (#22558 )

2026-05-02 18:03:25 +03:00

llama-io.h

llama : add option to save memory in device buffers (#22679 )

2026-05-05 06:35:07 +03:00

llama-kv-cache-dsa.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache-dsa.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache-iswa.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache-iswa.h

llama: print memory breakdown on exit (#15860 )

2025-09-24 16:53:48 +02:00

llama-kv-cache.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cells.h

llama: store mrope data in KV cell (#16825 )

2025-10-29 18:09:18 +01:00

llama-memory-hybrid-iswa.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-hybrid-iswa.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-hybrid.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-memory-hybrid.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-recurrent.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-recurrent.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory.cpp

memory : correctly handle failure in apply() (#14438 )

2025-06-30 18:03:03 +03:00

llama-memory.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-mmap.cpp

Update llama-mmap to use ftello/fseeko (#22497 )

2026-04-30 14:17:52 -07:00

llama-mmap.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model-loader.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-loader.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-saver.cpp

model : NvFP4 quantized LM head support (#23046 )

2026-05-16 11:09:27 +02:00

llama-model-saver.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model.cpp

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-model.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-quant.cpp

model: move load_hparams and load_tensors to per-model definition (#22004 )

2026-05-04 12:36:59 +02:00

llama-quant.h

llama : refactor src/llama.cpp (#10902 )

2025-01-03 10:18:53 +02:00

llama-sampler.cpp

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-sampler.h

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-vocab.cpp

convert: add MiniCPM5 tokenizer support (#23384 )

2026-05-27 08:08:33 +03:00

llama-vocab.h

convert: add MiniCPM5 tokenizer support (#23384 )

2026-05-27 08:08:33 +03:00

llama.cpp

llama : do not skip iGPU when only RPC devices are present (#23868 )

2026-05-30 07:48:22 +03:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

unicode.h

vocab: fix Gemma4 tokenizer (#21343 )

2026-04-03 10:33:03 +02:00