llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-04 02:00:23 +00:00

Files

T

Xuan-Son Nguyen 506bb6e010 model: try to improve Qwen3 Next (#18683 )

* qwen3next: simplify qkvz projection

* use ggml_swiglu_split

* revert swiglu_split, but remove redundant repeat()

* fix missing reshape

* rm 2 redundant transposes

* move mul_mat(k,q) to outside of chunking

* rm redundant cont

* improve g_cs_chunk

* add comments about no cont

* use std::pair instead of ggml_concat

* vectorize key_gdiff calculation

* rm unused tensor

* avoid ggml_concat inside loop

* bring back ggml_concat as it may not work on other backend

* nits

2026-01-11 12:53:33 +01:00

scripts

gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553 )

2025-11-28 20:53:01 +01:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

model: try to improve Qwen3 Next (#18683 )

2026-01-11 12:53:33 +01:00

gguf_reader.py

gguf-py : display the invalid gguf type (#13687 )