llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 06:10:19 +00:00

Files

T

Michael Wand 4988f6e866 Add arch support for cohere2-MoE (#24260 )

* Add arch support for cohere2-MoE

* Removed redundant gating_func checks

* Changed ffn lookup to prefer prefix_dense_intermediate_size

* Renamed arch to cohere2moe

* Removed redundant lmhead check and chat template changes

* Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight

* Changed to (routed+shared)*0.5 for shared expert combined avg

* fixed sliding_window_pattern issue and pattern

* Fixed transformers crash 'first_k_dense_replace' error

* Remove comment

* Removed cohere2-moe as a tokenizer type and kept as tiny_aya.  Renamed North-Mini-Code-1.0.

* Fixed MTP fail, changed to use iSWA

* Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use

* Force metadata usage

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove Cohere2 checkpoint comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove MTP comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Regenerate cohere2moe tokenizer hash

* Add cohere2moe to Llama Model Saver supported list

* Check for zerobios tensors and add support for Command to use LayerNorm

* Map expert_selection_fn to sigmoid in base.py instead of command.py

* use bools for foundnorm/foundnormrms

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

2026-06-13 19:49:00 +02:00

__init__.py

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

afmoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

arctic.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

baichuan.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bailingmoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

base.py

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

bert.py

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716 )

2026-06-02 17:55:11 +02:00

bitnet.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bloom.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chameleon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chatglm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

codeshell.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

cogvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

command_r.py

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

dbrx.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

deci.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

deepseek.py

mtmd: Add DeepSeekOCR 2 Support (#20975 )

2026-05-29 16:13:51 +02:00

dots1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dotsocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dream.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

ernie.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

exaone.py

model: Add EXAONE 4.5 implementations (#21733 )

2026-06-01 11:48:53 +02:00

falcon_h1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

falcon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gemma.py

mtp: support for gemma-4 E2B and E4B assistants (#24282 )

2026-06-08 13:48:52 -07:00

glm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gpt2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gpt_oss.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gptneox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

granite.py

model, mtmd: Granite4 Vision (#23545 )

2026-06-05 17:44:59 +02:00

grok.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

grovemoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

hunyuan.py

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

internlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

internvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jais.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jamba.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

januspro.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimi_linear.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimivl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

lfm2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

lighton_ocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llada.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama4.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama.py

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llava.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

maincoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mamba.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mellum.py

model: add Mellum architecture (#23966 )

2026-06-02 22:11:12 +03:00

mimo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

minicpm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

minimax.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral3.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral.py

convert : fix conversion for Mistral-Medium-3.5-128B (#24268 )

2026-06-07 21:41:39 +02:00

mpt.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

nemotron.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

olmo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

openelm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

orion.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

pangu.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

phi.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

pixtral.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plamo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

qwen3vl.py

convert : fix Qwen3 ASR conversion (#23081 )

2026-05-15 18:38:39 +02:00

qwen.py

convert : add compressed-tensors NVFP4 support (#21095 )

2026-05-25 14:16:11 +02:00

qwenvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

refact.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

rwkv.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

sarashina2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smallthinker.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smolvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

stablelm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

starcoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

step3.py

StepFun 3.5 MTP (#23274 )

2026-06-02 17:44:35 +02:00

t5.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

talkie.py

model : add support for talkie-1930-13b (#22596 )

2026-05-26 07:57:38 +03:00

ultravox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

wavtokenizer.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

xverse.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

youtuvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00