mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| checkpoints/hf | ||