llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-02 01:00:20 +00:00

Files

T

Oliver Simons 333da805fe Add initial version for top-p sampling

As we only support static graphs for the time and we don't know the size
of the output of top-p, we have to do value-scaling same as for min-p
operator.

Further improvements can be applied to the unit-test (i.e. check for
equivalence of top_p happening on backend with top_p happening on cpu)
and also by constructing candidates and sorting those as opposed to
reversing the sort of the logits (this would be arange +
get_rows instead of argsort + get_rows)

2025-11-28 15:16:20 +01:00

llama-cpp.h

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

llama.h

Add initial version for top-p sampling

2025-11-28 15:16:20 +01:00