doc: Document the docker release image on NGC (#4705)

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2026-01-14 06:27:45 +08:00 · 2025-05-28 06:00:53 +02:00 · 2025-05-28 06:00:53 +02:00 · f3fba4cc63
commit f3fba4cc63
parent 971d16a2ee
2 changed files with 58 additions and 1 deletions
--- a/docker/develop.md
+++ b/docker/develop.md
@ -29,7 +29,7 @@ where `x.xx.x` is the version of the TensorRT-LLM container to use. This command
 NVIDIA NGC registry, sets up the local user's account within the container, and launches it with full GPU support. The
 local source code of TensorRT-LLM will be mounted inside the container at the path `/code/tensorrt_llm` for seamless
 integration. Ensure that the image version matches the version of TensorRT-LLM in your current local git branch. Not
-specifying an `IMAGE_TAG` will attempt to resolve this automatically, but the not every intermediate release might be
+specifying an `IMAGE_TAG` will attempt to resolve this automatically, but not every intermediate release might be
 accompanied by development container. In that case, use the latest version preceding the version of your development
 branch.
--- a/docker/release.md
+++ b/docker/release.md
@ -0,0 +1,57 @@
 # Description
 TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support
 state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to
 create Python and C++ runtimes that orchestrate the inference execution in performant way.
 # Overview
 ## TensorRT-LLM Release Container
 The TensorRT-LLM Release container provides a pre-built environment for running TensorRT-LLM.
 Visit the [official GitHub repository](https://github.com/NVIDIA/TensorRT-LLM) for more details.
 ### Running TensorRT-LLM Using Docker
 A typical command to launch the container is:
 ```bash
 docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all \
    		nvcr.io/nvidia/tensorrt-llm/release:x.xx.x
 ```
 where x.xx.x is the version of the TensorRT-LLM container to use. To sanity check, run the following command:
 ```bash
 python3 -c "import tensorrt_llm"
 ```
 This command will print the TensorRT-LLM version if everything is working correctly. After verification, you can explore
 and try the example scripts included in `/app/tensorrt_llm/examples`.
 Alternatively, if you have already cloned the TensorRT-LLM repository, you can use the following convenient command to
 run the container:
 ```bash
 make -C docker ngc-release_run LOCAL_USER=1 DOCKER_PULL=1 IMAGE_TAG=x.xx.x
 ```
 This command pulls the specified container from the NVIDIA NGC registry, sets up the local user's account within the
 container, and launches it with full GPU support.
 For comprehensive information about TensorRT-LLM, including documentation, source code, examples, and installation
 guidelines, visit the following official resources:
 - [TensorRT-LLM GitHub Repository](https://github.com/NVIDIA/TensorRT-LLM)
 - [TensorRT-LLM Online Documentation](https://nvidia.github.io/TensorRT-LLM/latest/index.html)
 ### Security CVEs
 To review known CVEs on this image, refer to the Security Scanning tab on this page.
 ### License
 By pulling and using the container, you accept the terms and conditions of
 this [End User License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
 and [Product-Specific Terms](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).