TensorRT-LLMs/docker/Makefile
Daniel Cámpora 41ce5440fe
chore: Mass integration of release/0.18 (#3421)
* [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release

Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
(cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265)

* [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
(cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1)

* [None][Doc] - Update docs for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88)

* [Infra] - Fix or WAR issues in the package sanity check stages

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe)

* [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path

Signed-off-by: Yuki Huang <yukih@nvidia.com>
(cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a)

* cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue

Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
(cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8)

* Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'"

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f)

* [Infra]Restrict setuptools version to avoid sasb pip install issue

Signed-off-by: Emma Qiao <qqiao@nvidia.com>
(cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c)

* [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path

Signed-off-by: Yuki Huang <yukih@nvidia.com>
(cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac)

* WAR for bug 5173448

Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com>
(cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f)

* [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
(cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9)

* [Docs] - Doc changes for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4)

* [Doc] - Doc change for v0.18.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
(cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235)

* [Infra] update version to 0.18.1

Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
(cherry picked from commit 59e8326c75639275837d34de8e140358737a3365)

* Add back nemotron file.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Fix recurrentgemma reqs.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Adding WAR for bug 5173448.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Formatting.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Remove duplicated file.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update examples/prompt_lookup/requirements.txt

Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Remove glm-4-9b from model dir in chatglm test.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Remove indent change.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Revert changes on l0_test.groovy.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update dev images

Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

* Remove duplicated import.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Fix custom op

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Fix flashinfer & vanilla backend

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Skip problematic case.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-16 10:03:29 +08:00

180 lines
6.7 KiB
Makefile

# Default base image for the docker build as defined in Dockerfile.multi
BASE_IMAGE ?= $(shell grep 'ARG BASE_IMAGE=' Dockerfile.multi | grep -o '=.*' | tr -d '="')
BASE_TAG ?= $(shell grep 'ARG BASE_TAG=' Dockerfile.multi | grep -o '=.*' | tr -d '="')
# Name of the new image
IMAGE_NAME ?= tensorrt_llm
IMAGE_TAG ?= latest
# Local user information
USER_ID ?= $(shell id --user)
USER_NAME ?= $(shell id --user --name)
GROUP_ID ?= $(shell id --group)
GROUP_NAME ?= $(shell id --group --name)
# Set this to 1 to add the current user to the docker image and run the container with the user
LOCAL_USER ?= 0
ifeq ($(LOCAL_USER),1)
IMAGE_TAG_SUFFIX ?= -$(USER_NAME)
endif
# Set this to 1 to use the image from Jenkins as the image for the `devel` stage in the build phase
JENKINS_DEVEL ?= 0
# Default stage of the docker multi-stage build
STAGE ?=
# Set this to define a custom image name and tag
IMAGE_WITH_TAG ?= $(IMAGE_NAME)$(if $(STAGE),/$(STAGE)):$(IMAGE_TAG)
PUSH_TO_STAGING ?= 1
DOCKER_BUILD_OPTS ?= --pull
DOCKER_BUILD_ARGS ?=
DOCKER_PROGRESS ?= auto
CUDA_ARCHS ?=
BUILD_WHEEL_OPTS ?=
BUILD_WHEEL_ARGS ?= $(shell grep 'ARG BUILD_WHEEL_ARGS=' Dockerfile.multi | grep -o '=.*' | tr -d '="')$(if $(CUDA_ARCHS), --cuda_architectures $(CUDA_ARCHS))$(if $(BUILD_WHEEL_OPTS), $(BUILD_WHEEL_OPTS))
TORCH_INSTALL_TYPE ?= skip
CUDA_VERSION ?=
CUDNN_VERSION ?=
NCCL_VERSION ?=
CUBLAS_VERSION ?=
TRT_VERSION ?=
GIT_COMMIT ?= $(shell git rev-parse HEAD)
TRT_LLM_VERSION ?= $(shell grep '^__version__' ../tensorrt_llm/version.py | grep -o '=.*' | tr -d '= "')
GITHUB_MIRROR ?=
PYTHON_VERSION ?=
define add_local_user
docker build \
--progress $(DOCKER_PROGRESS) \
--build-arg BASE_IMAGE_WITH_TAG=$(1) \
--build-arg USER_ID=$(USER_ID) \
--build-arg USER_NAME=$(USER_NAME) \
--build-arg GROUP_ID=$(GROUP_ID) \
--build-arg GROUP_NAME=$(GROUP_NAME) \
--file Dockerfile.user \
--tag $(1)$(IMAGE_TAG_SUFFIX) \
..
endef
# Rewrite `/tensorrt-llm:` in image tag with `/tensorrt-llm-staging:` to avoid directly overwriting
define rewrite_tag
$(shell echo $(IMAGE_WITH_TAG) | sed "s/\/tensorrt-llm:/\/tensorrt-llm-staging:/g")
endef
%_build: DEVEL_IMAGE = $(if $(findstring 1,$(JENKINS_DEVEL)),$(shell grep 'LLM_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"'))
%_build:
@echo "Building docker image: $(IMAGE_WITH_TAG)"
DOCKER_BUILDKIT=1 docker build $(DOCKER_BUILD_OPTS) $(DOCKER_BUILD_ARGS) \
--progress $(DOCKER_PROGRESS) \
$(if $(BASE_IMAGE), --build-arg BASE_IMAGE=$(BASE_IMAGE)) \
$(if $(BASE_TAG), --build-arg BASE_TAG=$(BASE_TAG)) \
$(if $(BUILD_WHEEL_ARGS), --build-arg BUILD_WHEEL_ARGS="$(BUILD_WHEEL_ARGS)") \
$(if $(TORCH_INSTALL_TYPE), --build-arg TORCH_INSTALL_TYPE="$(TORCH_INSTALL_TYPE)") \
$(if $(CUDA_VERSION), --build-arg CUDA_VER="$(CUDA_VERSION)") \
$(if $(CUDNN_VERSION), --build-arg CUDNN_VER="$(CUDNN_VERSION)") \
$(if $(NCCL_VERSION), --build-arg NCCL_VER="$(NCCL_VERSION)") \
$(if $(CUBLAS_VERSION), --build-arg CUBLAS_VER="$(CUBLAS_VERSION)") \
$(if $(TRT_VERSION), --build-arg TRT_VER="$(TRT_VERSION)") \
$(if $(TRT_LLM_VERSION), --build-arg TRT_LLM_VER="$(TRT_LLM_VERSION)") \
$(if $(DEVEL_IMAGE), --build-arg DEVEL_IMAGE="$(DEVEL_IMAGE)") \
$(if $(GIT_COMMIT), --build-arg GIT_COMMIT="$(GIT_COMMIT)") \
$(if $(GITHUB_MIRROR), --build-arg GITHUB_MIRROR="$(GITHUB_MIRROR)") \
$(if $(PYTHON_VERSION), --build-arg PYTHON_VERSION="$(PYTHON_VERSION)") \
$(if $(STAGE), --target $(STAGE)) \
--file Dockerfile.multi \
--tag $(IMAGE_WITH_TAG) \
..
%_user:
$(call add_local_user,$(IMAGE_WITH_TAG))
%_push: %_build
@if [ $(PUSH_TO_STAGING) = 0 ]; then \
echo "Pushing docker image: $(IMAGE_WITH_TAG)"; \
docker push $(IMAGE_WITH_TAG)$(IMAGE_TAG_SUFFIX); \
fi
@if [ $(PUSH_TO_STAGING) = 1 ]; then \
echo "Rewriting docker tag: $(IMAGE_WITH_TAG) to $(call rewrite_tag)"; \
docker tag $(IMAGE_WITH_TAG)$(IMAGE_TAG_SUFFIX) $(call rewrite_tag)$(IMAGE_TAG_SUFFIX); \
echo "Pushing docker image: $(call rewrite_tag)"; \
docker push $(call rewrite_tag)$(IMAGE_TAG_SUFFIX); \
fi
%_pull:
@echo "Pulling docker image: $(IMAGE_WITH_TAG)"
docker pull $(IMAGE_WITH_TAG)
DOCKER_RUN_OPTS ?= --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864
DOCKER_RUN_ARGS ?=
GPU_OPTS ?= --gpus=all
SOURCE_DIR ?= $(shell readlink -f ..)
CODE_DIR ?= /code/tensorrt_llm
CCACHE_DIR ?= ${CODE_DIR}/cpp/.ccache
RUN_CMD ?=
CONTAINER_NAME ?= tensorrt_llm
WORK_DIR ?= $(CODE_DIR)
DOCKER_PULL ?= 0
%_run:
ifeq ($(DOCKER_PULL),1)
@$(MAKE) --no-print-directory $*_pull
endif
ifeq ($(LOCAL_USER),1)
$(call add_local_user,$(IMAGE_WITH_TAG))
endif
docker run $(DOCKER_RUN_OPTS) $(DOCKER_RUN_ARGS) \
$(GPU_OPTS) \
--volume $(SOURCE_DIR):$(CODE_DIR) \
--env "CCACHE_DIR=${CCACHE_DIR}" \
--env "CCACHE_BASEDIR=${CODE_DIR}" \
--workdir $(WORK_DIR) \
--hostname $(shell hostname)-$* \
--name $(CONTAINER_NAME)-$*-$(USER_NAME) \
--tmpfs /tmp:exec \
$(IMAGE_WITH_TAG)$(IMAGE_TAG_SUFFIX) $(RUN_CMD)
devel_%: STAGE = devel
wheel_%: STAGE = wheel
wheel_run: WORK_DIR = /src/tensorrt_llm
release_%: STAGE = release
release_run: WORK_DIR = /app/tensorrt_llm
# For x86_64
jenkins_%: IMAGE_WITH_TAG = $(shell grep 'LLM_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"')
jenkins_%: STAGE = devel
# For aarch64
jenkins-aarch64_%: IMAGE_WITH_TAG = $(shell grep 'LLM_SBSA_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"')
jenkins-aarch64_%: STAGE = devel
# For x86_64
jenkins-rockylinux8_%: IMAGE_WITH_TAG = $(shell grep 'LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"')
jenkins-rockylinux8_%: STAGE = devel
jenkins-rockylinux8_%: BASE_IMAGE = nvidia/cuda
jenkins-rockylinux8_%: BASE_TAG = 12.8.1-devel-rockylinux8
rockylinux8_%: STAGE = devel
rockylinux8_%: BASE_IMAGE = nvidia/cuda
rockylinux8_%: BASE_TAG = 12.8.1-devel-rockylinux8
# For x86_64 and aarch64
ubuntu22_%: STAGE = devel
ubuntu22_%: BASE_IMAGE = nvidia/cuda
ubuntu22_%: BASE_TAG = 12.8.1-devel-ubuntu22.04
trtllm_%: STAGE = release
trtllm_%: PUSH_TO_STAGING := 0
trtllm_%: DEVEL_IMAGE = $(shell grep 'LLM_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"')
trtllm_%: IMAGE_NAME = $(shell grep 'IMAGE_NAME = ' ../jenkins/BuildDockerImage.groovy | grep -o '".*"' | tr -d '"')
trtllm_%: IMAGE_TAG = $(shell git rev-parse --abbrev-ref HEAD | tr '/' '_')
trtllm_run: WORK_DIR = /app/tensorrt_llm
build: devel_build ;
push: devel_push ;
run: devel_run ;
.PHONY: build push run