mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

[None][doc] Move AutoDeploy README.md to torch docs (#6528 )

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

2025-08-08 19:11:45 -04:00

1.5 KiB

Raw Blame History

PyTorch Backend

Note:
This feature is currently in beta, and the related API is subjected to change in future versions.

To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new backend based on PyTorch.

The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You can try it via importing tensorrt_llm._torch.

Quick Start

Here is a simple example to show how to use tensorrt_llm.LLM API with Llama model.

    :language: python
    :linenos:

Features

Developer Guide

Key Components

Known Issues

The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the PyTorch NGC Container for optimal support on SBSA platforms.

Prototype Features

AutoDeploy: Seamless Model Deployment from PyTorch to TensorRT-LLM

1.5 KiB Raw Blame History