mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
doc: Document the docker release image on NGC (#4705)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
This commit is contained in:
parent
971d16a2ee
commit
f3fba4cc63
@ -29,7 +29,7 @@ where `x.xx.x` is the version of the TensorRT-LLM container to use. This command
|
||||
NVIDIA NGC registry, sets up the local user's account within the container, and launches it with full GPU support. The
|
||||
local source code of TensorRT-LLM will be mounted inside the container at the path `/code/tensorrt_llm` for seamless
|
||||
integration. Ensure that the image version matches the version of TensorRT-LLM in your current local git branch. Not
|
||||
specifying an `IMAGE_TAG` will attempt to resolve this automatically, but the not every intermediate release might be
|
||||
specifying an `IMAGE_TAG` will attempt to resolve this automatically, but not every intermediate release might be
|
||||
accompanied by development container. In that case, use the latest version preceding the version of your development
|
||||
branch.
|
||||
|
||||
|
||||
57
docker/release.md
Normal file
57
docker/release.md
Normal file
@ -0,0 +1,57 @@
|
||||
# Description
|
||||
|
||||
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support
|
||||
state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to
|
||||
create Python and C++ runtimes that orchestrate the inference execution in performant way.
|
||||
|
||||
# Overview
|
||||
|
||||
## TensorRT-LLM Release Container
|
||||
|
||||
The TensorRT-LLM Release container provides a pre-built environment for running TensorRT-LLM.
|
||||
|
||||
Visit the [official GitHub repository](https://github.com/NVIDIA/TensorRT-LLM) for more details.
|
||||
|
||||
### Running TensorRT-LLM Using Docker
|
||||
|
||||
A typical command to launch the container is:
|
||||
|
||||
```bash
|
||||
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all \
|
||||
nvcr.io/nvidia/tensorrt-llm/release:x.xx.x
|
||||
```
|
||||
|
||||
where x.xx.x is the version of the TensorRT-LLM container to use. To sanity check, run the following command:
|
||||
|
||||
```bash
|
||||
python3 -c "import tensorrt_llm"
|
||||
```
|
||||
|
||||
This command will print the TensorRT-LLM version if everything is working correctly. After verification, you can explore
|
||||
and try the example scripts included in `/app/tensorrt_llm/examples`.
|
||||
|
||||
Alternatively, if you have already cloned the TensorRT-LLM repository, you can use the following convenient command to
|
||||
run the container:
|
||||
|
||||
```bash
|
||||
make -C docker ngc-release_run LOCAL_USER=1 DOCKER_PULL=1 IMAGE_TAG=x.xx.x
|
||||
```
|
||||
|
||||
This command pulls the specified container from the NVIDIA NGC registry, sets up the local user's account within the
|
||||
container, and launches it with full GPU support.
|
||||
|
||||
For comprehensive information about TensorRT-LLM, including documentation, source code, examples, and installation
|
||||
guidelines, visit the following official resources:
|
||||
|
||||
- [TensorRT-LLM GitHub Repository](https://github.com/NVIDIA/TensorRT-LLM)
|
||||
- [TensorRT-LLM Online Documentation](https://nvidia.github.io/TensorRT-LLM/latest/index.html)
|
||||
|
||||
### Security CVEs
|
||||
|
||||
To review known CVEs on this image, refer to the Security Scanning tab on this page.
|
||||
|
||||
### License
|
||||
|
||||
By pulling and using the container, you accept the terms and conditions of
|
||||
this [End User License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
|
||||
and [Product-Specific Terms](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
|
||||
Loading…
Reference in New Issue
Block a user