TensorRT-LLMs/docker/develop.md

# Description

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports
state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to
create Python and C++ runtimes that orchestrate the inference execution in a performant way.

# Overview

## TensorRT LLM Develop Container

The TensorRT LLM Develop container includes all necessary dependencies to build TensorRT LLM from source. It is
specifically designed to be used alongside the source code cloned from the official TensorRT LLM repository:

[GitHub Repository - NVIDIA TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)

Full instructions for cloning the TensorRT LLM repository can be found in
the [TensorRT LLM Documentation](https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html).

> **Note:**
> This container does not contain a pre-built binary release of `TensorRT LLM` or tools like `trtllm-serve`.

### Running the TensorRT LLM Develop Container Using Docker

With the top-level directory of the TensorRT LLM repository cloned to your local machine, you can run the following
command to start the development container:

```bash
make -C docker ngc-devel_run LOCAL_USER=1 DOCKER_PULL=1 IMAGE_TAG=x.y.z
```

where `x.y.z` is the version of the TensorRT LLM container to use (cf. [release history on GitHub](https://github.com/NVIDIA/TensorRT-LLM/releases) and [tags in NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/devel/tags)). This command pulls the specified container from the
NVIDIA NGC registry, sets up the local user's account within the container, and launches it with full GPU support. The
local source code of TensorRT LLM will be mounted inside the container at the path `/code/tensorrt_llm` for seamless
integration. Ensure that the image version matches the version of TensorRT LLM in your currently checked out local git branch. Not
specifying a `IMAGE_TAG` will attempt to resolve this automatically, but not every intermediate release might be
accompanied by a development container. In that case, use the latest version preceding the version of your development
branch.

If you prefer launching the container directly with `docker`, you can use the following command:

```bash
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864  \
           --gpus=all \
           --env "CCACHE_DIR=/code/tensorrt_llm/cpp/.ccache" \
           --env "CCACHE_BASEDIR=/code/tensorrt_llm" \
           --env "CONAN_HOME=/code/tensorrt_llm/cpp/.conan" \
           --workdir /code/tensorrt_llm \
           --tmpfs /tmp:exec \
           --volume .:/code/tensorrt_llm \
           nvcr.io/nvidia/tensorrt-llm/devel:x.y.z
```

Note that this will start the container with the user `root`, which may leave files with root ownership in your local
checkout.

### Building the TensorRT LLM Wheel within the Container

You can build the TensorRT LLM Python wheel inside the development container using the following command:

```bash
./scripts/build_wheel.py --clean --use_ccache --cuda_architectures=native
```

#### Explanation of Build Flags:

- `--clean`: Clears intermediate build artifacts from prior builds to ensure a fresh compilation.
- `--use_ccache`: Enables `ccache` to optimize and accelerate subsequent builds by caching compilation results.
- `--cuda_architectures=native`: Configures the build for the native architecture of your GPU. Leave this away to build
  the wheel for all supported architectures. For additional details, refer to
  the [CUDA Architectures Documentation](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html#prop_tgt:CUDA_ARCHITECTURES).

For additional build options and their usage, refer to the help documentation by running:

```bash
./scripts/build_wheel.py --help
```

The wheel will be built in the `build` directory and can be installed using `pip install` like so:

```bash
pip install ./build/tensorrt_llm*.whl
```

For additional information on building the TensorRT LLM wheel, refer to
the [official documentation on building from source](https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#option-1-full-build-with-c-compilation).

### Security CVEs

To review known CVEs on this image, refer to the Security Scanning tab on this page.

### License

By pulling and using the container, you accept the terms and conditions of
this [End User License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and [Product-Specific Terms](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).