Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> open source f8c0381a2bc50ee2739c3d8c2be481b31e5f00bd (#2736) Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Add note for blackwell (#2742) Update the docs to workaround the extra-index-url issue (#2744) update README.md (#2751) Fix github io pages (#2761) Update
8.9 KiB
(build-from-source-windows)=
Building from Source Code on Windows
This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.
Prerequisites
- Install prerequisites listed in our Installing on Windows document.
- Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.
- Download and install Visual Studio 2022.
- Download and unzip TensorRT 10.8.0.43.
Building a TensorRT-LLM Docker Image
Docker Desktop
-
Install Docker Desktop on Windows.
-
Set the following configurations:
-
Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select Switch to Windows containers....
-
In the Docker Desktop settings on the General tab, uncheck Use the WSL 2 based image.
-
On the Docker Engine tab, set your configuration file to:
{
"experimental": true
}
After building, copy the files out of your container. `docker cp` is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, `trt-llm-build`, to your container when you run it for moving files between the container and host system.
Acquire an Image
The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the TensorRT-LLM\windows\ folder, run the build command:
docker build -f .\docker\Dockerfile -t tensorrt-llm-windows-build:latest .
And your image is now ready for use.
Run the Container
Run the container in interactive mode with your build folder mounted. Specify a memory limit with the -m flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.
docker run -it -m 12g -v .\trt-llm-build:C:\workspace\trt-llm-build tensorrt-llm-windows-build:latest
Build and Extract Files
- Clone and setup the TensorRT-LLM repository within the container.
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
- Build TensorRT-LLM. This command generates
build\tensorrt_llm-*.whl.
python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-10.8.0.43\
- Copy or move
build\tensorrt_llm-*.whlinto your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, you'll also need to gather various DLLs from the build into your mounted folder. For more information, refer to C++ Runtime Usage.
Building TensorRT-LLM on Bare Metal
Prerequisites
-
Install all prerequisites (
git,python,CUDA) listed in our Installing on Windows document. -
Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.8.0 installer. To install these assets, download the CUDA 11.8 Toolkit.
-
During installation, select Advanced installation.
-
Nsight NVTX is located in the CUDA drop-down.
-
Deselect all packages, and select Nsight NVTX.
-
-
Install the dependencies one of two ways:
-
Run the
setup_build_env.ps1script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.- Run PowerShell as Administrator to use the script.
./setup_build_env.ps1 -TRTPath <TRT-containing-folder> [-skipCMake] [-skipVSBuildTools] [-skipTRT]-
Close and reopen PowerShell after running the script so that
Pathchanges take effect. -
Supply a directory that already exists to contain TensorRT to
-TRTPath, for example,-TRTPath ~/inferencemay be valid, but-TRTPath ~/inference/TensorRTwill not be valid ifTensorRTdoes not exist.-TRTPathisn't required if-skipTRTis supplied.
-
Install the dependencies one at a time.
-
Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.
-
Download and install Visual Studio 2022. When prompted to select more Workloads, check Desktop development with C++.
-
Download and unzip TensorRT 10.8.0.43. Move the folder to a location you can reference later, such as
%USERPROFILE%\inference\TensorRT.- Add the libraries for TensorRT to your system's
Pathenvironment variable. YourPathshould include a line like this:
%USERPROFILE%\inference\TensorRT\lib-
Close and re-open any existing PowerShell or Git Bash windows so they pick up the new
Path. -
Remove existing
tensorrtwheels first by executing
pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12- Install the TensorRT core libraries, run PowerShell, and use
pipto install the Python wheel.
pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-*.whl- Verify that your TensorRT installation is working properly.
python -c "import tensorrt as trt; print(trt.__version__)" - Add the libraries for TensorRT to your system's
-
-
Steps
-
Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands.
- If you installed Visual Studio Build Tools (that is, used the
setup_build_env.ps1script):
& 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64- If you installed Visual Studio Community (e.g. via manual GUI setup):
& 'C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64 - If you installed Visual Studio Build Tools (that is, used the
-
In PowerShell, from the
TensorRT-LLMroot folder, run:
python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root>
The -a flag specifies the device architecture. "89-real" supports GeForce 40-series cards.
The flag -D "ENABLE_MULTI_DEVICE=0", while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.
This command generates build\tensorrt_llm-*.whl.
(c-runtime-usage)=
Linking with the TensorRT-LLM C++ Runtime
This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.
Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.
Building from source produces the following library files.
tensorrt_llmlibraries located incpp\build\tensorrt_llmtensorrt_llm.dll- Shared librarytensorrt_llm.exp- Export filetensorrt_llm.lib- Stub for linking totensorrt_llm.dll
- Dependency libraries (these get copied to
tensorrt_llm\libs\)nvinfer_plugin_tensorrt_llmlibraries located incpp\build\tensorrt_llm\plugins\nvinfer_plugin_tensorrt_llm.dllnvinfer_plugin_tensorrt_llm.expnvinfer_plugin_tensorrt_llm.lib
th_commonlibraries located incpp\build\tensorrt_llm\thop\th_common.dllth_common.expth_common.lib
The locations of the DLLs, in addition to some torch DLLs and TensorRT DLLs, must be added to the Windows Path in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your Path. When complete, your Path should include lines similar to these:
%USERPROFILE%\inference\TensorRT\lib
%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib
Your Path additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.
Again, close and re-open any existing PowerShell or Git Bash windows so they pick up the new Path.