TensorRT-LLMs/docs/source/installation/linux.md
Sharan Chetlur 258c7540c0 open source 09df54c0cc99354a60bbc0303e3e8ea33a96bef0 (#2725)
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

open source f8c0381a2bc50ee2739c3d8c2be481b31e5f00bd (#2736)

Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

Add note for blackwell (#2742)

Update the docs to workaround the extra-index-url issue (#2744)

update README.md (#2751)

Fix github io pages (#2761)

Update
2025-02-11 02:21:51 +00:00

2.7 KiB

(linux)=

Installing on Linux

  1. Install TensorRT-LLM (tested on Ubuntu 24.04).

    sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm --extra-index-url https://pypi.nvidia.com
    
  2. Sanity check the installation by running the following in Python (tested on Python 3.12):

        :language: python
        :linenos:
    

Known limitations

There are some known limitations when you pip install pre-built TensorRT-LLM wheel package.

  1. C++11 ABI

    The pre-built TensorRT-LLM wheel has linked against the public pytorch hosted on pypi, which turned off C++11 ABI. While the NVIDIA optimized pytorch inside NGC container nvcr.io/nvidia/pytorch:xx.xx-py3 turned on the C++11 ABI, see NGC pytorch container page . Thus we recommend users to build from source inside when using the NGC pytorch container. Build from source guideline can be found in Build from Source Code on Linux

  2. MPI in the Slurm environment

    If you encounter an error while running TensorRT-LLM in a Slurm-managed cluster, you need to reconfigure the MPI installation to work with Slurm. The setup methods depends on your slurm configuration, pls check with your admin. This is not a TensorRT-LLM specific, rather a general mpi+slurm issue.

    The application appears to have been direct launched using "srun",
    but OMPI was not built with SLURM support. This usually happens
    when OMPI was not configured --with-slurm and we weren't able
    to discover a SLURM installation in the usual places.
    
  3. CUDA Toolkit

    pip install tensorrt-llm won't install CUDA toolkit in your system, and the CUDA Toolkit is not required if want to just deploy a TensorRT-LLM engine. TensorRT-LLM uses the ModelOpt to quantize a model, while the ModelOpt requires CUDA toolkit to jit compile certain kernels which is not included in the pytorch to do quantization effectively. Please install CUDA toolkit when you see the following message when running ModelOpt quantization.

    /usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/cpp_extension.py:65:
    UserWarning: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
    Unable to load extension modelopt_cuda_ext and falling back to CPU version.
    

    The installation of CUDA toolkit can be found in CUDA Toolkit Documentation