TensorRT-LLMs/_sources/torch.md.txt
Shi Xiaowei 5e2cf02f46
Update gh-pages (#4284)
update docs for 0.20.0rc2

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-14 11:12:52 +08:00

55 lines
1.8 KiB
Plaintext

# PyTorch Backend
```{note}
Note:
This feature is currently experimental, and the related API is subjected to change in future versions.
```
To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new experimental backend based on PyTorch.
The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You can try it via importing `tensorrt_llm._torch`.
## Quick Start
Here is a simple example to show how to use `tensorrt_llm._torch.LLM` API with Llama model.
```{literalinclude} ../../examples/pytorch/quickstart.py
:language: python
:linenos:
```
## Quantization
The PyTorch backend supports FP8 and NVFP4 quantization. You can pass quantized models in HF model hub,
which are generated by [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
```python
from tensorrt_llm._torch import LLM
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8')
llm.generate("Hello, my name is")
```
Or you can try the following commands to get a quantized model by yourself:
```bash
git clone https://github.com/NVIDIA/TensorRT-Model-Optimizer.git
cd TensorRT-Model-Optimizer/examples/llm_ptq
scripts/huggingface_example.sh --model <huggingface_model_card> --quant fp8 --export_fmt hf
```
## Developer Guide
- [Architecture Overview](./torch/arch_overview.md)
- [Adding a New Model](./torch/adding_new_model.md)
- [Examples](../../examples/pytorch/README.md)
## Key Components
- [Attention](./torch/attention.md)
- [KV Cache Manager](./torch/kv_cache_manager.md)
- [Scheduler](./torch/scheduler.md)
## Known Issues
- The PyTorch workflow on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for optimal support on SBSA platforms.