TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 19:52:38 +08:00

History

William Zhang 34c1e9c341 [None][feat] Skip prefetching consolidated safetensors when appropriate (#7225 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-08-26 09:40:17 -07:00
..
hf	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225 )	2025-08-26 09:40:17 -07:00
__init__.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
auto_mapper.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
base_checkpoint_loader.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
base_config_loader.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
base_weight_loader.py	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 )	2025-07-17 00:50:30 +08:00
base_weight_mapper.py	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 )	2025-07-30 07:22:32 -04:00