mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-30 15:43:19 +08:00
Delete the unstacked weights immediately to save GPU memory, cleanup occurs automatically after the transformation, but for large models we'll run out of memory during the transformation itself. Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| _utils_test | ||
| unit | ||