Files
llama.cpp/common
Aman Gupta de6f727aae llama: limit max outputs of llama_context (#23861)
* llama: save more VRAM by reserving n_outputs == n_seqs when possible

* add n_outputs_per_seq

* move n_outputs_max to server-context

* change ubatch to batch everywhere
2026-06-01 18:01:38 +03:00
..
2026-05-14 13:05:52 +03:00
2026-05-14 13:05:52 +03:00
2026-05-19 15:32:58 +03:00