llama.cpp/src at b9333 - llama.cpp - Gitea: Git with a cup of tea

kanshan/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 07:10:21 +00:00

Files

T

History

Pascal 328874d054 model: tag ffn_latent as MUL_MAT to fix buft probe (#23664 )

ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but
nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks
the backend about the declared op, so it tested an elementwise MUL on a
q8_0 weight. That used to return true unconditionally and the weight
stayed on GPU by luck. Once supports_op told the truth, the probe got a
no and the loader pushed the weight and its matmul to CPU, splitting the
graph. Tagging it MUL_MAT asks the real question, the math is unchanged.

Verified on Nemotron 3 Super 120B Q5_K_M: from 64.9 back to 103.22 t/s.

2026-05-25 16:05:04 +02:00

..

model : add NVFP4 MTP scale tensors (#23563 )

2026-05-23 13:30:31 +02:00

CMakeLists.txt

cmake: use glob to collect src/models sources (#22005 )

2026-04-16 23:25:16 +02:00

llama-adapter.cpp

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-adapter.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama-arch.cpp

model: tag ffn_latent as MUL_MAT to fix buft probe (#23664 )

2026-05-25 16:05:04 +02:00

llama-arch.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-batch.cpp

kv-cache : fix M-RoPE checkpoints (#20132 )

2026-03-06 08:46:51 +02:00

llama-batch.h

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-chat.cpp

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

llama-chat.h

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

llama-context.cpp

Move to backend sampling for MTP draft path (#23287 )

2026-05-20 22:34:45 +05:30

llama-context.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-cparams.cpp

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )

2025-06-15 10:08:58 +03:00

llama-cparams.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-ext.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-grammar.cpp

common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )

2026-03-21 18:43:35 +01:00

llama-grammar.h

common/grammar : replace problematic backtracking regex [\s\S]* (#18342 )

2026-01-03 16:02:43 -06:00

llama-graph.cpp

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131 )

2026-05-21 09:20:51 +03:00

llama-graph.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-hparams.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-hparams.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-impl.cpp

llama : correct platform-independent loading of BOOL metadata (#21428 )

2026-04-06 01:40:38 +02:00

llama-impl.h

llama : enable chunked fused GDN path (#20340 )

2026-03-11 22:46:40 +02:00

llama-io.cpp

server : avoid checkpoint data host copies (#22558 )

2026-05-02 18:03:25 +03:00

llama-io.h

llama : add option to save memory in device buffers (#22679 )

2026-05-05 06:35:07 +03:00

llama-kv-cache-iswa.cpp

(revert) kv-cache : do not quantize SWA KV cache (#21332 )

2026-04-03 09:07:01 +03:00

llama-kv-cache-iswa.h

llama: print memory breakdown on exit (#15860 )

2025-09-24 16:53:48 +02:00

llama-kv-cache.cpp

ggml : implement fast walsh-hadamard transform for kv rotation (#21352 ) (#22631 )

2026-05-05 10:05:05 +08:00

llama-kv-cache.h

kv-cache : support attention rotation for heterogeneous iSWA (#21513 )

2026-04-07 20:31:28 +03:00

llama-kv-cells.h

llama: store mrope data in KV cell (#16825 )

2025-10-29 18:09:18 +01:00

llama-memory-hybrid-iswa.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-hybrid-iswa.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-hybrid.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-hybrid.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-recurrent.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-recurrent.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory.cpp

memory : correctly handle failure in apply() (#14438 )

2025-06-30 18:03:03 +03:00

llama-memory.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-mmap.cpp

Update llama-mmap to use ftello/fseeko (#22497 )

2026-04-30 14:17:52 -07:00

llama-mmap.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model-loader.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-loader.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-saver.cpp

model : NvFP4 quantized LM head support (#23046 )

2026-05-16 11:09:27 +02:00

llama-model-saver.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model.cpp

model : add NVFP4 MTP scale tensors (#23563 )

2026-05-23 13:30:31 +02:00

llama-model.h

model : add NVFP4 MTP scale tensors (#23563 )

2026-05-23 13:30:31 +02:00

llama-quant.cpp

model: move load_hparams and load_tensors to per-model definition (#22004 )

2026-05-04 12:36:59 +02:00

llama-quant.h

llama : refactor src/llama.cpp (#10902 )

2025-01-03 10:18:53 +02:00

llama-sampler.cpp

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-sampler.h

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-vocab.cpp

vocab : fix HybridDNA tokenizer (#23466 )

2026-05-22 11:17:31 +02:00

llama-vocab.h

model : add sarvam_moe architecture support (#20275 )

2026-05-09 16:31:50 +02:00

llama.cpp

llama : add missing call to ggml_backend_load_all() (#22752 )

2026-05-07 08:24:47 +03:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

unicode.h

vocab: fix Gemma4 tokenizer (#21343 )

2026-04-03 10:33:03 +02:00