llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-25 13:50:20 +00:00

Files

T

History

YiChen Lv d789527482 spec : Support Step3.5/3.7 flash mtp3 (#24340 )

* add mtp_layer_offset + include nextn flags in graph reuse

* add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API

* offset head select + require all MTP blocks

* speculative multi-head process()

* speculative multi-head draft()

* gather outputs via inp_out_ids

* cleanup

* fix core

* minor cleanup

* merged draft_multi_head into draft()

* mtp rename nextn

* Apply suggestions from code review

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* clean-up comments

* fix for multi seq

* apply suggestions && chain-heads comment

* add a reference for chain_heads discussion

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

2026-06-21 11:33:18 +03:00

llama-cpp.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama.h

spec : Support Step3.5/3.7 flash mtp3 (#24340 )

2026-06-21 11:33:18 +03:00