mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
45 lines
1.5 KiB
Markdown
45 lines
1.5 KiB
Markdown
# PyTorch Backend
|
|
|
|
```{note}
|
|
Note:
|
|
This feature is currently in beta, and the related API is subjected to change in future versions.
|
|
```
|
|
To enhance the usability of the system and improve developer efficiency, TensorRT LLM launches a new backend based on PyTorch.
|
|
|
|
The PyTorch backend of TensorRT LLM is available in version 0.17 and later. You can try it via importing `tensorrt_llm._torch`.
|
|
|
|
## Quick Start
|
|
|
|
Here is a simple example to show how to use `tensorrt_llm.LLM` API with Llama model.
|
|
|
|
```{literalinclude} ../../examples/llm-api/quickstart_example.py
|
|
:language: python
|
|
:linenos:
|
|
```
|
|
|
|
## Features
|
|
|
|
- [Sampling](./torch/features/sampling.md)
|
|
- [Quantization](./torch/features/quantization.md)
|
|
- [Overlap Scheduler](./torch/features/overlap_scheduler.md)
|
|
- [Feature Combination Matrix](./torch/features/feature_combination_matrix.md)
|
|
|
|
## Developer Guide
|
|
|
|
- [Architecture Overview](./torch/arch_overview.md)
|
|
- [Adding a New Model](./torch/adding_new_model.md)
|
|
|
|
## Key Components
|
|
|
|
- [Attention](./torch/attention.md)
|
|
- [KV Cache Manager](./torch/kv_cache_manager.md)
|
|
- [Scheduler](./torch/scheduler.md)
|
|
|
|
## Known Issues
|
|
|
|
- The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for optimal support on SBSA platforms.
|
|
|
|
## Prototype Features
|
|
|
|
- [AutoDeploy: Seamless Model Deployment from PyTorch to TensorRT LLM](./torch/auto_deploy/auto-deploy.md)
|