Files
llama.cpp/common
Gaurav Garg ad27757261 Move to backend sampling for MTP draft path (#23287)
* Move to backend sampling for MTP draft path

Run top_k(10) on the draft backend. D2H transfers happen only for the top 10 logits

Make backend sampling more robust and fallback to CPU on failure cases, such as with "-sm tensor" or when a backend doesn't support TOP_K.

* Allow sampler chains to be partially offloaded to backend

* Add --spec-draft-backend-sampling argument. Enabled by default.
2026-05-20 22:34:45 +05:30
..
2026-05-16 20:06:23 +08:00
2026-05-14 13:05:52 +03:00
2026-05-14 13:05:52 +03:00
2026-05-14 13:05:52 +03:00
2026-05-19 15:32:58 +03:00
2026-01-30 18:21:48 +02:00