mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <ruodil@nvidia.com> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <qqiao@nvidia.com> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Formatting. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove duplicated file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove indent change. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update dev images Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * Remove duplicated import. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix custom op Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Skip problematic case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
130 lines
4.4 KiB
Docker
130 lines
4.4 KiB
Docker
# Multi-stage Dockerfile
|
|
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
|
|
ARG BASE_TAG=25.03-py3
|
|
ARG DEVEL_IMAGE=devel
|
|
|
|
FROM ${BASE_IMAGE}:${BASE_TAG} AS base
|
|
|
|
# https://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html
|
|
# The default values come from `nvcr.io/nvidia/pytorch`
|
|
ENV BASH_ENV=${BASH_ENV:-/etc/bash.bashrc}
|
|
ENV ENV=${ENV:-/etc/shinit_v2}
|
|
ARG GITHUB_MIRROR=""
|
|
ENV GITHUB_MIRROR=$GITHUB_MIRROR
|
|
RUN echo "Using GitHub mirror: $GITHUB_MIRROR"
|
|
SHELL ["/bin/bash", "-c"]
|
|
|
|
# Clean up the pip constraint file from the base NGC PyTorch image.
|
|
RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true
|
|
|
|
FROM base AS devel
|
|
|
|
ARG PYTHON_VERSION="3.12.3"
|
|
RUN echo "Using Python version: $PYTHON_VERSION"
|
|
COPY docker/common/install_base.sh install_base.sh
|
|
RUN bash ./install_base.sh $PYTHON_VERSION && rm install_base.sh
|
|
|
|
COPY docker/common/install_cmake.sh install_cmake.sh
|
|
RUN bash ./install_cmake.sh && rm install_cmake.sh
|
|
|
|
COPY docker/common/install_ccache.sh install_ccache.sh
|
|
RUN bash ./install_ccache.sh && rm install_ccache.sh
|
|
|
|
# Only take effect when the base image is Rocky Linux 8 with old CUDA version.
|
|
COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh
|
|
RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh
|
|
|
|
# Download & install latest TRT release
|
|
ARG TRT_VER
|
|
ARG CUDA_VER
|
|
ARG CUDNN_VER
|
|
ARG NCCL_VER
|
|
ARG CUBLAS_VER
|
|
COPY docker/common/install_tensorrt.sh install_tensorrt.sh
|
|
RUN bash ./install_tensorrt.sh \
|
|
--TRT_VER=${TRT_VER} \
|
|
--CUDA_VER=${CUDA_VER} \
|
|
--CUDNN_VER=${CUDNN_VER} \
|
|
--NCCL_VER=${NCCL_VER} \
|
|
--CUBLAS_VER=${CUBLAS_VER} && \
|
|
rm install_tensorrt.sh
|
|
|
|
# Install latest Polygraphy
|
|
COPY docker/common/install_polygraphy.sh install_polygraphy.sh
|
|
RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh
|
|
|
|
# Install mpi4py
|
|
COPY docker/common/install_mpi4py.sh install_mpi4py.sh
|
|
RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh
|
|
|
|
# Install PyTorch
|
|
ARG TORCH_INSTALL_TYPE="skip"
|
|
COPY docker/common/install_pytorch.sh install_pytorch.sh
|
|
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh
|
|
|
|
# Install OpenCV with FFMPEG support
|
|
RUN pip3 uninstall -y opencv && rm -rf /usr/local/lib/python3*/dist-packages/cv2/
|
|
RUN pip3 install opencv-python-headless --force-reinstall --no-deps
|
|
|
|
FROM ${DEVEL_IMAGE} AS wheel
|
|
WORKDIR /src/tensorrt_llm
|
|
COPY benchmarks benchmarks
|
|
COPY cpp cpp
|
|
COPY benchmarks benchmarks
|
|
COPY scripts scripts
|
|
COPY tensorrt_llm tensorrt_llm
|
|
COPY 3rdparty 3rdparty
|
|
COPY .gitmodules setup.py requirements.txt requirements-dev.txt ./
|
|
|
|
# Create cache directories for pip and ccache
|
|
RUN mkdir -p /root/.cache/pip /root/.cache/ccache
|
|
ENV CCACHE_DIR=/root/.cache/ccache
|
|
# Build the TRT-LLM wheel
|
|
ARG BUILD_WHEEL_ARGS="--clean --python_bindings --benchmarks"
|
|
RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \
|
|
python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
|
|
|
|
FROM ${DEVEL_IMAGE} AS release
|
|
|
|
# Create a cache directory for pip
|
|
RUN mkdir -p /root/.cache/pip
|
|
|
|
WORKDIR /app/tensorrt_llm
|
|
COPY --from=wheel /src/tensorrt_llm/build/tensorrt_llm*.whl .
|
|
RUN --mount=type=cache,target=/root/.cache/pip \
|
|
pip install tensorrt_llm*.whl && \
|
|
rm tensorrt_llm*.whl
|
|
COPY README.md ./
|
|
COPY docs docs
|
|
COPY cpp/include include
|
|
RUN ln -sv $(python3 -c 'import site; print(f"{site.getsitepackages()[0]}/tensorrt_llm/bin")') bin && \
|
|
test -f bin/executorWorker && \
|
|
ln -sv $(python3 -c 'import site; print(f"{site.getsitepackages()[0]}/tensorrt_llm/libs")') lib && \
|
|
test -f lib/libnvinfer_plugin_tensorrt_llm.so && \
|
|
echo "/app/tensorrt_llm/lib" > /etc/ld.so.conf.d/tensorrt_llm.conf && \
|
|
ldconfig
|
|
# Test LD configuration
|
|
RUN ! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" )
|
|
|
|
ARG SRC_DIR=/src/tensorrt_llm
|
|
COPY --from=wheel ${SRC_DIR}/benchmarks benchmarks
|
|
ARG CPP_BUILD_DIR=${SRC_DIR}/cpp/build
|
|
COPY --from=wheel \
|
|
${CPP_BUILD_DIR}/benchmarks/bertBenchmark \
|
|
${CPP_BUILD_DIR}/benchmarks/gptManagerBenchmark \
|
|
${CPP_BUILD_DIR}/benchmarks/gptSessionBenchmark \
|
|
${CPP_BUILD_DIR}/benchmarks/disaggServerBenchmark \
|
|
benchmarks/cpp/
|
|
COPY examples examples
|
|
RUN chmod -R a+w examples && \
|
|
rm -v \
|
|
benchmarks/cpp/bertBenchmark.cpp \
|
|
benchmarks/cpp/gptManagerBenchmark.cpp \
|
|
benchmarks/cpp/gptSessionBenchmark.cpp \
|
|
benchmarks/cpp/disaggServerBenchmark.cpp \
|
|
benchmarks/cpp/CMakeLists.txt
|
|
ARG GIT_COMMIT
|
|
ARG TRT_LLM_VER
|
|
ENV TRT_LLM_GIT_COMMIT=${GIT_COMMIT} \
|
|
TRT_LLM_VERSION=${TRT_LLM_VER}
|