2 Commits

Author SHA1 Message Date
Michael Wand 4988f6e866 Add arch support for cohere2-MoE (#24260)
* Add arch support for cohere2-MoE

* Removed redundant gating_func checks

* Changed ffn lookup to prefer prefix_dense_intermediate_size

* Renamed arch to cohere2moe

* Removed redundant lmhead check and chat template changes

* Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight

* Changed to (routed+shared)*0.5 for shared expert combined avg

* fixed sliding_window_pattern issue and pattern

* Fixed transformers crash 'first_k_dense_replace' error

* Remove comment

* Removed cohere2-moe as a tokenizer type and kept as tiny_aya.  Renamed North-Mini-Code-1.0.

* Fixed MTP fail, changed to use iSWA

* Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use

* Force metadata usage

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove Cohere2 checkpoint comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove MTP comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Regenerate cohere2moe tokenizer hash

* Add cohere2moe to Llama Model Saver supported list

* Check for zerobios tensors and add support for Command to use LayerNorm

* Map expert_selection_fn to sigmoid in base.py instead of command.py

* use bools for foundnorm/foundnormrms

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-06-13 19:49:00 +02:00
Piotr Wilkin (ilintar) cc7200bf12 Refactor: convert_hf_to_gguf.py (#17114)
* move conversion code to a dedicated conversion directory and split the files akin to the src/models architecture

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-15 15:18:12 +02:00