mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
58 lines
2.5 KiB
Markdown
58 lines
2.5 KiB
Markdown
# Description
|
|
|
|
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports
|
|
state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to
|
|
create Python and C++ runtimes that orchestrate the inference execution in a performant way.
|
|
|
|
# Overview
|
|
|
|
## TensorRT LLM Release Container
|
|
|
|
The TensorRT LLM Release container provides a pre-built environment for running TensorRT-LLM.
|
|
|
|
Visit the [official GitHub repository](https://github.com/NVIDIA/TensorRT-LLM) for more details.
|
|
|
|
### Running TensorRT LLM Using Docker
|
|
|
|
A typical command to launch the container is:
|
|
|
|
```bash
|
|
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all \
|
|
nvcr.io/nvidia/tensorrt-llm/release:x.y.z
|
|
```
|
|
|
|
where x.y.z is the version of the TensorRT LLM container to use (cf. [release history on GitHub](https://github.com/NVIDIA/TensorRT-LLM/releases) and [tags in NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags)). To sanity check, run the following command:
|
|
|
|
```bash
|
|
python3 -c "import tensorrt_llm"
|
|
```
|
|
|
|
This command will print the TensorRT LLM version if everything is working correctly. After verification, you can explore
|
|
and try the example scripts included in `/app/tensorrt_llm/examples`.
|
|
|
|
Alternatively, if you have already cloned the TensorRT LLM repository, you can use the following convenient command to
|
|
run the container:
|
|
|
|
```bash
|
|
make -C docker ngc-release_run LOCAL_USER=1 DOCKER_PULL=1 IMAGE_TAG=x.y.z
|
|
```
|
|
|
|
This command pulls the specified container from the NVIDIA NGC registry, sets up the local user's account within the
|
|
container, and launches it with full GPU support.
|
|
|
|
For comprehensive information about TensorRT-LLM, including documentation, source code, examples, and installation
|
|
guidelines, visit the following official resources:
|
|
|
|
- [TensorRT LLM GitHub Repository](https://github.com/NVIDIA/TensorRT-LLM)
|
|
- [TensorRT LLM Online Documentation](https://nvidia.github.io/TensorRT-LLM/latest/index.html)
|
|
|
|
### Security CVEs
|
|
|
|
To review known CVEs on this image, refer to the Security Scanning tab on this page.
|
|
|
|
### License
|
|
|
|
By pulling and using the container, you accept the terms and conditions of
|
|
this [End User License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
|
|
and [Product-Specific Terms](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
|