mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> open source f8c0381a2bc50ee2739c3d8c2be481b31e5f00bd (#2736) Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Add note for blackwell (#2742) Update the docs to workaround the extra-index-url issue (#2744) update README.md (#2751) Fix github io pages (#2761) Update
200 lines
8.9 KiB
Markdown
200 lines
8.9 KiB
Markdown
(build-from-source-windows)=
|
|
|
|
# Building from Source Code on Windows
|
|
|
|
```{note}
|
|
This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
1. Install prerequisites listed in our [Installing on Windows](https://nvidia.github.io/TensorRT-LLM/installation/windows.html) document.
|
|
2. Install [CMake](https://cmake.org/download/), version 3.27.7 is recommended, and select the option to add it to the system path.
|
|
3. Download and install [Visual Studio 2022](https://visualstudio.microsoft.com/).
|
|
4. Download and unzip [TensorRT 10.8.0.43](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/zip/TensorRT-10.8.0.43.Windows.win10.cuda-12.8.zip).
|
|
|
|
## Building a TensorRT-LLM Docker Image
|
|
|
|
### Docker Desktop
|
|
|
|
1. Install [Docker Desktop on Windows](https://docs.docker.com/desktop/install/windows-install/).
|
|
2. Set the following configurations:
|
|
|
|
1. Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select **Switch to Windows containers...**.
|
|
2. In the Docker Desktop settings on the **General** tab, uncheck **Use the WSL 2 based image**.
|
|
3. On the **Docker Engine** tab, set your configuration file to:
|
|
|
|
```
|
|
{
|
|
"experimental": true
|
|
}
|
|
```
|
|
|
|
```{note}
|
|
After building, copy the files out of your container. `docker cp` is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, `trt-llm-build`, to your container when you run it for moving files between the container and host system.
|
|
```
|
|
|
|
### Acquire an Image
|
|
|
|
The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the `TensorRT-LLM\windows\` folder, run the build command:
|
|
|
|
```bash
|
|
docker build -f .\docker\Dockerfile -t tensorrt-llm-windows-build:latest .
|
|
```
|
|
|
|
And your image is now ready for use.
|
|
|
|
### Run the Container
|
|
|
|
Run the container in interactive mode with your build folder mounted. Specify a memory limit with the `-m` flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.
|
|
|
|
```bash
|
|
docker run -it -m 12g -v .\trt-llm-build:C:\workspace\trt-llm-build tensorrt-llm-windows-build:latest
|
|
```
|
|
|
|
### Build and Extract Files
|
|
|
|
1. Clone and setup the TensorRT-LLM repository within the container.
|
|
|
|
```bash
|
|
git clone https://github.com/NVIDIA/TensorRT-LLM.git
|
|
cd TensorRT-LLM
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
2. Build TensorRT-LLM. This command generates `build\tensorrt_llm-*.whl`.
|
|
|
|
```bash
|
|
python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-10.8.0.43\
|
|
```
|
|
|
|
3. Copy or move `build\tensorrt_llm-*.whl` into your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, you'll also need to gather various DLLs from the build into your mounted folder. For more information, refer to [C++ Runtime Usage](#c-runtime-usage).
|
|
|
|
|
|
|
|
## Building TensorRT-LLM on Bare Metal
|
|
|
|
**Prerequisites**
|
|
|
|
1. Install all prerequisites (`git`, `python`, `CUDA`) listed in our [Installing on Windows](https://nvidia.github.io/TensorRT-LLM/installation/windows.html) document.
|
|
2. Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.8.0 installer. To install these assets, download the [CUDA 11.8 Toolkit](https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64).
|
|
|
|
1. During installation, select **Advanced installation**.
|
|
|
|
2. Nsight NVTX is located in the CUDA drop-down.
|
|
|
|
3. Deselect all packages, and select **Nsight NVTX**.
|
|
|
|
3. Install the dependencies one of two ways:
|
|
|
|
1. Run the `setup_build_env.ps1` script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.
|
|
|
|
1. Run PowerShell as Administrator to use the script.
|
|
|
|
```bash
|
|
./setup_build_env.ps1 -TRTPath <TRT-containing-folder> [-skipCMake] [-skipVSBuildTools] [-skipTRT]
|
|
```
|
|
|
|
2. Close and reopen PowerShell after running the script so that `Path` changes take effect.
|
|
|
|
3. Supply a directory that already exists to contain TensorRT to `-TRTPath`, for example, `-TRTPath ~/inference` may be valid, but `-TRTPath ~/inference/TensorRT` will not be valid if `TensorRT` does not exist. `-TRTPath` isn't required if `-skipTRT` is supplied.
|
|
|
|
2. Install the dependencies one at a time.
|
|
|
|
1. Install [CMake](https://cmake.org/download/), version 3.27.7 is recommended, and select the option to add it to the system path.
|
|
2. Download and install [Visual Studio 2022](https://visualstudio.microsoft.com/). When prompted to select more Workloads, check **Desktop development with C++**.
|
|
3. Download and unzip [TensorRT 10.8.0.43](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/zip/TensorRT-10.8.0.43.Windows.win10.cuda-12.8.zip). Move the folder to a location you can reference later, such as `%USERPROFILE%\inference\TensorRT`.
|
|
|
|
1. Add the libraries for TensorRT to your system's `Path` environment variable. Your `Path` should include a line like this:
|
|
|
|
```bash
|
|
%USERPROFILE%\inference\TensorRT\lib
|
|
```
|
|
|
|
2. Close and re-open any existing PowerShell or Git Bash windows so they pick up the new `Path`.
|
|
|
|
3. Remove existing `tensorrt` wheels first by executing
|
|
|
|
```bash
|
|
pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings
|
|
pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
|
|
```
|
|
|
|
4. Install the TensorRT core libraries, run PowerShell, and use `pip` to install the Python wheel.
|
|
|
|
```bash
|
|
pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-*.whl
|
|
```
|
|
|
|
5. Verify that your TensorRT installation is working properly.
|
|
|
|
```bash
|
|
python -c "import tensorrt as trt; print(trt.__version__)"
|
|
```
|
|
|
|
|
|
**Steps**
|
|
|
|
1. Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands.
|
|
|
|
1. If you installed Visual Studio Build Tools (that is, used the `setup_build_env.ps1` script):
|
|
|
|
```bash
|
|
& 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
|
|
```
|
|
|
|
2. If you installed Visual Studio Community (e.g. via manual GUI setup):
|
|
|
|
```bash
|
|
& 'C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
|
|
```
|
|
|
|
2. In PowerShell, from the `TensorRT-LLM` root folder, run:
|
|
|
|
```bash
|
|
python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root>
|
|
```
|
|
|
|
The `-a` flag specifies the device architecture. `"89-real"` supports GeForce 40-series cards.
|
|
|
|
The flag `-D "ENABLE_MULTI_DEVICE=0"`, while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.
|
|
|
|
This command generates `build\tensorrt_llm-*.whl`.
|
|
|
|
(c-runtime-usage)=
|
|
## Linking with the TensorRT-LLM C++ Runtime
|
|
|
|
```{note}
|
|
This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.
|
|
```
|
|
|
|
Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.
|
|
|
|
Building from source produces the following library files.
|
|
- `tensorrt_llm` libraries located in `cpp\build\tensorrt_llm`
|
|
- `tensorrt_llm.dll` - Shared library
|
|
- `tensorrt_llm.exp` - Export file
|
|
- `tensorrt_llm.lib` - Stub for linking to `tensorrt_llm.dll`
|
|
- Dependency libraries (these get copied to `tensorrt_llm\libs\`)
|
|
- `nvinfer_plugin_tensorrt_llm` libraries located in `cpp\build\tensorrt_llm\plugins\`
|
|
- `nvinfer_plugin_tensorrt_llm.dll`
|
|
- `nvinfer_plugin_tensorrt_llm.exp`
|
|
- `nvinfer_plugin_tensorrt_llm.lib`
|
|
- `th_common` libraries located in `cpp\build\tensorrt_llm\thop\`
|
|
- `th_common.dll`
|
|
- `th_common.exp`
|
|
- `th_common.lib`
|
|
|
|
The locations of the DLLs, in addition to some `torch` DLLs and `TensorRT` DLLs, must be added to the Windows `Path` in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your `Path`. When complete, your `Path` should include lines similar to these:
|
|
|
|
```bash
|
|
%USERPROFILE%\inference\TensorRT\lib
|
|
%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm
|
|
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
|
|
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib
|
|
```
|
|
|
|
Your `Path` additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.
|
|
|
|
Again, close and re-open any existing PowerShell or Git Bash windows so they pick up the new `Path`.
|