mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-07-04 18:20:21 +00:00
e4d2e198b9
estimate with to-be-loaded model size included use no_alloc to get memory requirements for model load only set model memory_mb if not previously calculated use memory margin instead of total size limit, apply to each device separately add server memory debug logging move llama_context_device_memory function to llama-ext.h fix model count exceeded check improve memory_per_device map naming improve variable naming, fix style also strip models memory margin from child processes cont : clean-up replace device memory map with buft memory map. Use llama_get_memory_breakdown extract duplicated check into helper function move model memory estimation to subprocess precompute name->buft map, map GPU host types to CPU buft cleanup unused variable remove duplicated init calls