llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 06:10:19 +00:00

Files

T

History

Ruixiang Wang 88a39274ec spec: add EAGLE3 speculative decoding support (#18039 )

* llama : enable layer input extraction

* spec: support eagle3

* eagle3: fix params bug

* eagle3: support Gemma4 eagle3 from RedHatAI

* eagle3: set sync when get features from target

Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com>

* eagle3 : fix ubatch handling in embd_layer_inp extraction and encoder

Co-authored-by: Doğaç Eldenk <dogacel@gmail.com>

* eagle3: adapt to upstream changes

* eagle3: fix rebase issues and adapt to upstream changes

* eagle3:exclude the eagle3 arch from test-llama-archs

* eagle3: fix editorconfig check failures

* eagle3: fix multi-seq issue in d2t vocab mapping

* cont : minor style / clean-up

* spec : remove `common_speculative_setup_draft_model()`

* llama : clean-up unused API

* eagle3: set d2t vocab mapping in decode graph

* cont : assert layer inputs are configured

* hparams : use n_embd_inp instead of n_embd_target_features

* eagle3: make output.weight optional and inherit from target model when needed

* haparams : generic norm-before-residual param

* llama-ext : consistent names

* cont : fix

* hparams : remove target_hidden_size

* cparams : rename output_layer_inp -> embeddings_layer_inp

* arch : reuse ATTN_NORM_2 instead of adding new hidden norm

* llama : clean-up names

* cont : add assert + comment

* Update conversion/llama.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com>
Co-authored-by: Doğaç Eldenk <dogacel@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

2026-06-12 10:21:06 +03:00

__init__.py

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

afmoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

arctic.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

baichuan.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bailingmoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

base.py

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

bert.py

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716 )

2026-06-02 17:55:11 +02:00

bitnet.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bloom.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chameleon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chatglm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

codeshell.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

cogvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

command_r.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dbrx.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

deci.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

deepseek.py

mtmd: Add DeepSeekOCR 2 Support (#20975 )

2026-05-29 16:13:51 +02:00

dots1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dotsocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dream.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

ernie.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

exaone.py

model: Add EXAONE 4.5 implementations (#21733 )

2026-06-01 11:48:53 +02:00

falcon_h1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

falcon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gemma.py

mtp: support for gemma-4 E2B and E4B assistants (#24282 )

2026-06-08 13:48:52 -07:00

glm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gpt2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gpt_oss.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gptneox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

granite.py

model, mtmd: Granite4 Vision (#23545 )

2026-06-05 17:44:59 +02:00

grok.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

grovemoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

hunyuan.py

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

internlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

internvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jais.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jamba.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

januspro.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimi_linear.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimivl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

lfm2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

lighton_ocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llada.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama4.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama.py

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llava.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

maincoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mamba.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mellum.py

model: add Mellum architecture (#23966 )

2026-06-02 22:11:12 +03:00

mimo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

minicpm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

minimax.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral3.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral.py

convert : fix conversion for Mistral-Medium-3.5-128B (#24268 )

2026-06-07 21:41:39 +02:00

mpt.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

nemotron.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

olmo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

openelm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

orion.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

pangu.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

phi.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

pixtral.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plamo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

qwen3vl.py

convert : fix Qwen3 ASR conversion (#23081 )

2026-05-15 18:38:39 +02:00

qwen.py

convert : add compressed-tensors NVFP4 support (#21095 )

2026-05-25 14:16:11 +02:00

qwenvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

refact.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

rwkv.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

sarashina2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smallthinker.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smolvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

stablelm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

starcoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

step3.py

StepFun 3.5 MTP (#23274 )

2026-06-02 17:44:35 +02:00

t5.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

talkie.py

model : add support for talkie-1930-13b (#22596 )

2026-05-26 07:57:38 +03:00

ultravox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

wavtokenizer.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

xverse.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

youtuvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00