3 Commits

Author SHA1 Message Date
Georgi Gerganov 7acb4e8cd2 hparams : refactor hparams.n_layer (#24060)
* hparams : refactor hparams.n_layer

* cont : remove `n_layer_kv()`, use n_layer_all instead

* cont : type consistency

* pi : update SYSTEM.md

* models : fix Step3.5 MTP

* cont : remove duplicate switch cases

* cont : explicitly set `false` to extra layers for `is_swa` and `is_recr`

* cont : fix nextn layer count handling

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-06-05 11:09:36 +03:00
Georgi Gerganov 06938ac129 tests : add support for qwen3 SSM archs (#24031)
* tests : add support for qwen3 SSM archs

* arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS

* cont : naming + TODOs
2026-06-03 10:15:27 +03:00
Mikhail Podvitskii 4fb16eccce model: add Mellum architecture (#23966)
* model: support for Mellum architecture

* model: improve mellum.py formatting

* model: improve mellum.py formatting once again

* deps: downgrade transformers to 4.57.6 (to fix CI)

* deps: remove huggingface_hub dependency

* deps: remove huggingface_hub from test requirements

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-06-02 22:11:12 +03:00