mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 06:10:19 +00:00

Files

T

Xuan-Son Nguyen 552258c535 server: (router) rework -hf preset repo (#24739 )

* server: temporary remove HF remote preset

* rework remove preset.ini support

* rm unused get_remote_preset_whitelist()

* print warning

* add docs

* rm stray file

2026-06-18 12:45:23 +02:00

2.7 KiB

Raw Permalink Blame History

llama.cpp INI Presets

Introduction

The INI preset feature, introduced in PR#17859, allows users to create reusable and shareable parameter configurations for llama.cpp.

Using Presets with the Server

When running multiple models on the server (router mode), INI preset files can be used to configure model-specific parameters. Please refer to the server documentation for more details.

Using a Hugging Face Preset

Important

Please only use presets that you can trust! Unknown presets may be unsafe

You can push your preset to Hugging Face Hub and share with other users by:

Creating an empty model repository on Hugging Face
Creating a preset.ini file in the root directory of the repository

Example of a preset.ini:

[*]
ctx-size             = 0
mmap                 = 1
kv-unified           = 1
parallel             = 4
spec-default         = 1

[Qwen3.5-4B]
hf                   = unsloth/Qwen3.5-4B-GGUF:Q4_K_M
ctx-size             = 262144
batch-size           = 2048
ubatch-size          = 2048
top-p                = 1.0
top-k                = 0
min-p                = 0.01
temp                 = 1.0

[gpt-oss-120b-hf]
hf                   = ggml-org/gpt-oss-120b-GGUF
ctx-size             = 262144
batch-size           = 2048
ubatch-size          = 2048
top-p                = 1.0
top-k                = 0
min-p                = 0.01
temp                 = 1.0
chat-template-kwargs = {"reasoning_effort": "high"}

The preset will be loaded similarly to the --models-preset option. Therefore, you can also override certain params via CLI arguments:

# Force temp = 0.1, overriding the preset value
llama-cli -hf username/my-preset --temp 0.1

Named presets

If you want to define multiple preset configurations for one or more GGUF models, you can create a blank HF repo containing a single preset.ini file that references the actual model(s):

[*]
mmap = 1

[gpt-oss-20b-hf]
hf          = ggml-org/gpt-oss-20b-GGUF
batch-size  = 2048
ubatch-size = 2048
top-p       = 1.0
top-k       = 0
min-p       = 0.01
temp        = 1.0
chat-template-kwargs = {"reasoning_effort": "high"}

[gpt-oss-120b-hf]
hf          = ggml-org/gpt-oss-120b-GGUF
batch-size  = 2048
ubatch-size = 2048
top-p       = 1.0
top-k       = 0
min-p       = 0.01
temp        = 1.0
chat-template-kwargs = {"reasoning_effort": "high"}

You can then use it via llama-cli or llama-server, example:

llama-server -hf user/repo:gpt-oss-120b-hf

Please make sure to provide the correct hf-repo for each child preset. Otherwise, you may get error: The specified tag is not a valid quantization scheme.

2.7 KiB Raw Permalink Blame History