fairydreaming
|
1f0aa2a696
|
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346)
* llama : support DeepSeek V3.2 model family (with DSA lightning indexer)
* convert : handle DeepseekV32ForCausalLM architecture
* ggml : support for f16 GGML_OP_FILL
* memory : separate hparams argument in llama_kv_cache constructor
* memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)
* llama : support for LLM_ARCH_DEEPSEEK32
* model : llama_model_deepseek32 implementation
* model : merge two scale operations into one in DSA lightning indexer implementation
* chore : remove unused code
* model : support NVFP4 in DeepSeek V3.2
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* memory : refactoring TODO
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
|
2026-05-29 10:15:17 +02:00 |
|