TensorRT-LLMs/docs/source/reference/support-matrix.md
2024-08-29 17:25:07 +08:00

7.3 KiB

(support-matrix)=

Support Matrix

TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.

Models

LLM Models

Multi-Modal Models 2

(support-matrix-hardware)=

Hardware

The following table shows the supported hardware for TensorRT-LLM.

If a GPU is not listed, it is important to note that TensorRT-LLM is expected to work on GPUs based on the Volta, Turing, Ampere, Hopper, and Ada Lovelace architectures. Certain limitations may, however, apply.

:header-rows: 1
:widths: 20 80

* -
  - Hardware Compatibility
* - Operating System
  - TensorRT-LLM requires Linux x86_64 or Windows.
* - GPU Model Architectures
  -
    - [NVIDIA Hopper H100 GPU](https://www.nvidia.com/en-us/data-center/h100/)
    - [NVIDIA L40S GPU](https://www.nvidia.com/en-us/data-center/l40s/)
    - [NVIDIA Ada Lovelace GPU](https://www.nvidia.com/en-us/technologies/ada-architecture/)
    - [NVIDIA Ampere A100 GPU](https://www.nvidia.com/en-us/data-center/a100/)
    - [NVIDIA A30 GPU](https://www.nvidia.com/en-us/data-center/products/a30-gpu/)
    - [NVIDIA Turing T4 GPU](https://www.nvidia.com/en-us/data-center/tesla-t4/)
    - [NVIDIA Volta V100 GPU](https://www.nvidia.com/en-us/data-center/v100/) (experimental)

(support-matrix-software)=

Software

The following table shows the supported software for TensorRT-LLM.

:header-rows: 1
:widths: 20 80

* -
  - Software Compatibility
* - Container
  - [24.07](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
* - TensorRT
  - [10.3](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
* - Precision
  -
    - Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4
    - Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4
    - Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[^smgte89]
    - Turing (SM75) - FP32, FP16, INT8[^smooth], INT4
    - Volta (SM70) - FP32, FP16, INT8[^smooth], INT4[^smlt75]
Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to {ref}`precision` and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) folder for additional information.

  1. Encoder-Decoder provides general encoder-decoder functionality that supports many encoder-decoder models such as T5 family, BART family, Whisper family, NMT family, and so on. ↩︎

  2. Multi-modal provides general multi-modal functionality that supports many multi-modal architectures such as BLIP2 family, LLaVA family, and so on. ↩︎

  3. Only supports bfloat16 precision. ↩︎