TensorRT-LLMs/installation/build-from-source-windows.html
2024-12-25 13:44:02 +08:00

437 lines
34 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="../">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Building from Source Code on Windows &mdash; tensorrt_llm documentation</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<script src="../_static/jquery.js?v=5d32c60e"></script>
<script src="../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
<script src="../_static/doctools.js?v=888ff710"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="../_static/clipboard.min.js?v=a7894cd8"></script>
<script src="../_static/copybutton.js?v=65e89d2a"></script>
<script src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Installing on Grace Hopper" href="grace-hopper.html" />
<link rel="prev" title="Installing on Windows" href="windows.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home">
tensorrt_llm
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Getting Started</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../overview.html">Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="../quick-start-guide.html">Quick Start Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../key-features.html">Key Features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../release-notes.html">Release Notes</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Installation</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="linux.html">Installing on Linux</a></li>
<li class="toctree-l1"><a class="reference internal" href="build-from-source-linux.html">Building from Source Code on Linux</a></li>
<li class="toctree-l1"><a class="reference internal" href="windows.html">Installing on Windows</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Building from Source Code on Windows</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#prerequisites">Prerequisites</a></li>
<li class="toctree-l2"><a class="reference internal" href="#building-a-tensorrt-llm-docker-image">Building a TensorRT-LLM Docker Image</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#docker-desktop">Docker Desktop</a></li>
<li class="toctree-l3"><a class="reference internal" href="#acquire-an-image">Acquire an Image</a></li>
<li class="toctree-l3"><a class="reference internal" href="#run-the-container">Run the Container</a></li>
<li class="toctree-l3"><a class="reference internal" href="#build-and-extract-files">Build and Extract Files</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#building-tensorrt-llm-on-bare-metal">Building TensorRT-LLM on Bare Metal</a></li>
<li class="toctree-l2"><a class="reference internal" href="#linking-with-the-tensorrt-llm-c-runtime">Linking with the TensorRT-LLM C++ Runtime</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="grace-hopper.html">Installing on Grace Hopper</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">LLM API</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../llm-api/index.html">API Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="../llm-api/reference.html">API Reference</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">LLM API Examples</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../llm-api-examples/index.html">LLM Examples Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="../llm-api-examples/customization.html">Common Customizations</a></li>
<li class="toctree-l1"><a class="reference internal" href="../llm-api-examples/llm_api_examples.html">Examples</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Model Definition API</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.layers.html">Layers</a></li>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.functional.html">Functionals</a></li>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.models.html">Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.plugin.html">Plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.quantization.html">Quantization</a></li>
<li class="toctree-l1"><a class="reference internal" href="../python-api/tensorrt_llm.runtime.html">Runtime</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">C++ API</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../_cpp_gen/executor.html">Executor</a></li>
<li class="toctree-l1"><a class="reference internal" href="../_cpp_gen/runtime.html">Runtime</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Command-Line Reference</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../commands/trtllm-build.html">trtllm-build</a></li>
<li class="toctree-l1"><a class="reference internal" href="../commands/trtllm-serve.html">trtllm-serve</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Architecture</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../architecture/overview.html">TensorRT-LLM Architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/core-concepts.html">Model Definition</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/core-concepts.html#compilation">Compilation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/core-concepts.html#runtime">Runtime</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/core-concepts.html#multi-gpu-and-multi-node-support">Multi-GPU and Multi-Node Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/checkpoint.html">TensorRT-LLM Checkpoint</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/workflow.html">TensorRT-LLM Build Workflow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/add-model.html">Adding a Model</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Advanced</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../advanced/gpt-attention.html">Multi-Head, Multi-Query, and Group-Query Attention</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/gpt-runtime.html">C++ GPT Runtime</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/executor.html">Executor API</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/graph-rewriting.html">Graph Rewriting Module</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/inference-request.html">Inference Request</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/inference-request.html#responses">Responses</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/lora.html">Run gpt-2b + LoRA using GptManager / cpp runtime</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/expert-parallelism.html">Expert Parallelism in TensorRT-LLM</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/kv-cache-reuse.html">KV cache reuse</a></li>
<li class="toctree-l1"><a class="reference internal" href="../advanced/speculative-decoding.html">Speculative Sampling</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Performance</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../performance/perf-overview.html">Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="../performance/perf-benchmarking.html">Benchmarking</a></li>
<li class="toctree-l1"><a class="reference internal" href="../performance/perf-best-practices.html">Best Practices</a></li>
<li class="toctree-l1"><a class="reference internal" href="../performance/perf-analysis.html">Performance Analysis</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Reference</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../reference/troubleshooting.html">Troubleshooting</a></li>
<li class="toctree-l1"><a class="reference internal" href="../reference/support-matrix.html">Support Matrix</a></li>
<li class="toctree-l1"><a class="reference internal" href="../reference/precision.html">Numerical Precision</a></li>
<li class="toctree-l1"><a class="reference internal" href="../reference/memory.html">Memory Usage of TensorRT-LLM</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Blogs</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../blogs/H100vsA100.html">H100 has 4.6x A100 Performance in TensorRT-LLM, achieving 10,000 tok/s at 100ms to first token</a></li>
<li class="toctree-l1"><a class="reference internal" href="../blogs/H200launch.html">H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM</a></li>
<li class="toctree-l1"><a class="reference internal" href="../blogs/Falcon180B-H200.html">Falcon-180B on a single H200 GPU with INT4 AWQ, and 6.7x faster Llama-70B over A100</a></li>
<li class="toctree-l1"><a class="reference internal" href="../blogs/quantization-in-TRT-LLM.html">Speed up inference with SOTA quantization techniques in TRT-LLM</a></li>
<li class="toctree-l1"><a class="reference internal" href="../blogs/XQA-kernel.html">New XQA-kernel provides 2.4x more Llama-70B throughput within the same latency budget</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">tensorrt_llm</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Building from Source Code on Windows</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/installation/build-from-source-windows.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="building-from-source-code-on-windows">
<span id="build-from-source-windows"></span><h1>Building from Source Code on Windows<a class="headerlink" href="#building-from-source-code-on-windows" title="Link to this heading"></a></h1>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.</p>
</div>
<section id="prerequisites">
<h2>Prerequisites<a class="headerlink" href="#prerequisites" title="Link to this heading"></a></h2>
<ol class="arabic simple">
<li><p>Install prerequisites listed in our <a class="reference external" href="https://nvidia.github.io/TensorRT-LLM/installation/windows.html">Installing on Windows</a> document.</p></li>
<li><p>Install <a class="reference external" href="https://cmake.org/download/">CMake</a>, version 3.27.7 is recommended, and select the option to add it to the system path.</p></li>
<li><p>Download and install <a class="reference external" href="https://visualstudio.microsoft.com/">Visual Studio 2022</a>.</p></li>
<li><p>Download and unzip <a class="reference external" href="https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip">TensorRT 10.7.0.23</a>.</p></li>
</ol>
</section>
<section id="building-a-tensorrt-llm-docker-image">
<h2>Building a TensorRT-LLM Docker Image<a class="headerlink" href="#building-a-tensorrt-llm-docker-image" title="Link to this heading"></a></h2>
<section id="docker-desktop">
<h3>Docker Desktop<a class="headerlink" href="#docker-desktop" title="Link to this heading"></a></h3>
<ol class="arabic simple">
<li><p>Install <a class="reference external" href="https://docs.docker.com/desktop/install/windows-install/">Docker Desktop on Windows</a>.</p></li>
<li><p>Set the following configurations:</p></li>
<li><p>Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select <strong>Switch to Windows containers…</strong>.</p></li>
<li><p>In the Docker Desktop settings on the <strong>General</strong> tab, uncheck <strong>Use the WSL 2 based image</strong>.</p></li>
<li><p>On the <strong>Docker Engine</strong> tab, set your configuration file to:</p></li>
</ol>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">{</span>
<span class="s2">&quot;experimental&quot;</span><span class="p">:</span> <span class="n">true</span>
<span class="p">}</span>
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>After building, copy the files out of your container. <code class="docutils literal notranslate"><span class="pre">docker</span> <span class="pre">cp</span></code> is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, <code class="docutils literal notranslate"><span class="pre">trt-llm-build</span></code>, to your container when you run it for moving files between the container and host system.</p>
</div>
</section>
<section id="acquire-an-image">
<h3>Acquire an Image<a class="headerlink" href="#acquire-an-image" title="Link to this heading"></a></h3>
<p>The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the <code class="docutils literal notranslate"><span class="pre">TensorRT-LLM\windows\</span></code> folder, run the build command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>-f<span class="w"> </span>.<span class="se">\d</span>ocker<span class="se">\D</span>ockerfile<span class="w"> </span>-t<span class="w"> </span>tensorrt-llm-windows-build:latest<span class="w"> </span>.
</pre></div>
</div>
<p>And your image is now ready for use.</p>
</section>
<section id="run-the-container">
<h3>Run the Container<a class="headerlink" href="#run-the-container" title="Link to this heading"></a></h3>
<p>Run the container in interactive mode with your build folder mounted. Specify a memory limit with the <code class="docutils literal notranslate"><span class="pre">-m</span></code> flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>-m<span class="w"> </span>12g<span class="w"> </span>-v<span class="w"> </span>.<span class="se">\t</span>rt-llm-build:C:<span class="se">\w</span>orkspace<span class="se">\t</span>rt-llm-build<span class="w"> </span>tensorrt-llm-windows-build:latest
</pre></div>
</div>
</section>
<section id="build-and-extract-files">
<h3>Build and Extract Files<a class="headerlink" href="#build-and-extract-files" title="Link to this heading"></a></h3>
<ol class="arabic simple">
<li><p>Clone and setup the TensorRT-LLM repository within the container.</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/NVIDIA/TensorRT-LLM.git
<span class="nb">cd</span><span class="w"> </span>TensorRT-LLM
git<span class="w"> </span>submodule<span class="w"> </span>update<span class="w"> </span>--init<span class="w"> </span>--recursive
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>Build TensorRT-LLM. This command generates <code class="docutils literal notranslate"><span class="pre">build\tensorrt_llm-*.whl</span></code>.</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>.<span class="se">\s</span>cripts<span class="se">\b</span>uild_wheel.py<span class="w"> </span>-a<span class="w"> </span><span class="s2">&quot;89-real&quot;</span><span class="w"> </span>--trt_root<span class="w"> </span>C:<span class="se">\w</span>orkspace<span class="se">\T</span>ensorRT-10.7.0.23<span class="se">\</span>
</pre></div>
</div>
<ol class="arabic simple" start="3">
<li><p>Copy or move <code class="docutils literal notranslate"><span class="pre">build\tensorrt_llm-*.whl</span></code> into your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, youll also need to gather various DLLs from the build into your mounted folder. For more information, refer to <a class="reference internal" href="#c-runtime-usage">C++ Runtime Usage</a>.</p></li>
</ol>
</section>
</section>
<section id="building-tensorrt-llm-on-bare-metal">
<h2>Building TensorRT-LLM on Bare Metal<a class="headerlink" href="#building-tensorrt-llm-on-bare-metal" title="Link to this heading"></a></h2>
<p><strong>Prerequisites</strong></p>
<ol class="arabic">
<li><p>Install all prerequisites (<code class="docutils literal notranslate"><span class="pre">git</span></code>, <code class="docutils literal notranslate"><span class="pre">python</span></code>, <code class="docutils literal notranslate"><span class="pre">CUDA</span></code>) listed in our <a class="reference external" href="https://nvidia.github.io/TensorRT-LLM/installation/windows.html">Installing on Windows</a> document.</p></li>
<li><p>Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.6.3 installer. To install these assets, download the <a class="reference external" href="https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&amp;amp;target_arch=x86_64">CUDA 11.8 Toolkit</a>.</p>
<ol class="arabic simple">
<li><p>During installation, select <strong>Advanced installation</strong>.</p></li>
<li><p>Nsight NVTX is located in the CUDA drop-down.</p></li>
<li><p>Deselect all packages, and select <strong>Nsight NVTX</strong>.</p></li>
</ol>
</li>
<li><p>Install the dependencies one of two ways:</p>
<ol class="arabic">
<li><p>Run the <code class="docutils literal notranslate"><span class="pre">setup_build_env.ps1</span></code> script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.</p>
<ol class="arabic simple">
<li><p>Run PowerShell as Administrator to use the script.</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./setup_build_env.ps1<span class="w"> </span>-TRTPath<span class="w"> </span>&lt;TRT-containing-folder&gt;<span class="w"> </span><span class="o">[</span>-skipCMake<span class="o">]</span><span class="w"> </span><span class="o">[</span>-skipVSBuildTools<span class="o">]</span><span class="w"> </span><span class="o">[</span>-skipTRT<span class="o">]</span>
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>Close and reopen PowerShell after running the script so that <code class="docutils literal notranslate"><span class="pre">Path</span></code> changes take effect.</p></li>
<li><p>Supply a directory that already exists to contain TensorRT to <code class="docutils literal notranslate"><span class="pre">-TRTPath</span></code>, for example, <code class="docutils literal notranslate"><span class="pre">-TRTPath</span> <span class="pre">~/inference</span></code> may be valid, but <code class="docutils literal notranslate"><span class="pre">-TRTPath</span> <span class="pre">~/inference/TensorRT</span></code> will not be valid if <code class="docutils literal notranslate"><span class="pre">TensorRT</span></code> does not exist. <code class="docutils literal notranslate"><span class="pre">-TRTPath</span></code> isnt required if <code class="docutils literal notranslate"><span class="pre">-skipTRT</span></code> is supplied.</p></li>
</ol>
</li>
<li><p>Install the dependencies one at a time.</p>
<ol class="arabic">
<li><p>Install <a class="reference external" href="https://cmake.org/download/">CMake</a>, version 3.27.7 is recommended, and select the option to add it to the system path.</p></li>
<li><p>Download and install <a class="reference external" href="https://visualstudio.microsoft.com/">Visual Studio 2022</a>. When prompted to select more Workloads, check <strong>Desktop development with C++</strong>.</p></li>
<li><p>Download and unzip <a class="reference external" href="https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip">TensorRT 10.7.0.23</a>. Move the folder to a location you can reference later, such as <code class="docutils literal notranslate"><span class="pre">%USERPROFILE%\inference\TensorRT</span></code>.</p>
<ol class="arabic simple">
<li><p>Add the libraries for TensorRT to your systems <code class="docutils literal notranslate"><span class="pre">Path</span></code> environment variable. Your <code class="docutils literal notranslate"><span class="pre">Path</span></code> should include a line like this:</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>%USERPROFILE%<span class="se">\i</span>nference<span class="se">\T</span>ensorRT<span class="se">\l</span>ib
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>Close and re-open any existing PowerShell or Git Bash windows so they pick up the new <code class="docutils literal notranslate"><span class="pre">Path</span></code>.</p></li>
<li><p>Remove existing <code class="docutils literal notranslate"><span class="pre">tensorrt</span></code> wheels first by executing</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>uninstall<span class="w"> </span>-y<span class="w"> </span>tensorrt<span class="w"> </span>tensorrt_libs<span class="w"> </span>tensorrt_bindings
pip<span class="w"> </span>uninstall<span class="w"> </span>-y<span class="w"> </span>nvidia-cublas-cu12<span class="w"> </span>nvidia-cuda-nvrtc-cu12<span class="w"> </span>nvidia-cuda-runtime-cu12<span class="w"> </span>nvidia-cudnn-cu12
</pre></div>
</div>
<ol class="arabic simple" start="4">
<li><p>Install the TensorRT core libraries, run PowerShell, and use <code class="docutils literal notranslate"><span class="pre">pip</span></code> to install the Python wheel.</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>%USERPROFILE%<span class="se">\i</span>nference<span class="se">\T</span>ensorRT<span class="se">\p</span>ython<span class="se">\t</span>ensorrt-*.whl
</pre></div>
</div>
<ol class="arabic simple" start="5">
<li><p>Verify that your TensorRT installation is working properly.</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;import tensorrt as trt; print(trt.__version__)&quot;</span>
</pre></div>
</div>
</li>
</ol>
</li>
</ol>
</li>
</ol>
<p><strong>Steps</strong></p>
<ol class="arabic">
<li><p>Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands.</p>
<ol class="arabic simple">
<li><p>If you installed Visual Studio Build Tools (that is, used the <code class="docutils literal notranslate"><span class="pre">setup_build_env.ps1</span></code> script):</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="p">&amp;</span><span class="w"> </span><span class="s1">&#39;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1&#39;</span><span class="w"> </span>-Arch<span class="w"> </span>amd64
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>If you installed Visual Studio Community (e.g. via manual GUI setup):</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="p">&amp;</span><span class="w"> </span><span class="s1">&#39;C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1&#39;</span><span class="w"> </span>-Arch<span class="w"> </span>amd64
</pre></div>
</div>
</li>
<li><p>In PowerShell, from the <code class="docutils literal notranslate"><span class="pre">TensorRT-LLM</span></code> root folder, run:</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>.<span class="se">\s</span>cripts<span class="se">\b</span>uild_wheel.py<span class="w"> </span>-a<span class="w"> </span><span class="s2">&quot;89-real&quot;</span><span class="w"> </span>--trt_root<span class="w"> </span>&lt;path_to_trt_root&gt;
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">-a</span></code> flag specifies the device architecture. <code class="docutils literal notranslate"><span class="pre">&quot;89-real&quot;</span></code> supports GeForce 40-series cards.</p>
<p>The flag <code class="docutils literal notranslate"><span class="pre">-D</span> <span class="pre">&quot;ENABLE_MULTI_DEVICE=0&quot;</span></code>, while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.</p>
<p>This command generates <code class="docutils literal notranslate"><span class="pre">build\tensorrt_llm-*.whl</span></code>.</p>
</section>
<section id="linking-with-the-tensorrt-llm-c-runtime">
<span id="c-runtime-usage"></span><h2>Linking with the TensorRT-LLM C++ Runtime<a class="headerlink" href="#linking-with-the-tensorrt-llm-c-runtime" title="Link to this heading"></a></h2>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.</p>
</div>
<p>Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.</p>
<p>Building from source produces the following library files.</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">tensorrt_llm</span></code> libraries located in <code class="docutils literal notranslate"><span class="pre">cpp\build\tensorrt_llm</span></code></p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorrt_llm.dll</span></code> - Shared library</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorrt_llm.exp</span></code> - Export file</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorrt_llm.lib</span></code> - Stub for linking to <code class="docutils literal notranslate"><span class="pre">tensorrt_llm.dll</span></code></p></li>
</ul>
</li>
<li><p>Dependency libraries (these get copied to <code class="docutils literal notranslate"><span class="pre">tensorrt_llm\libs\</span></code>)</p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">nvinfer_plugin_tensorrt_llm</span></code> libraries located in <code class="docutils literal notranslate"><span class="pre">cpp\build\tensorrt_llm\plugins\</span></code></p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">nvinfer_plugin_tensorrt_llm.dll</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">nvinfer_plugin_tensorrt_llm.exp</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">nvinfer_plugin_tensorrt_llm.lib</span></code></p></li>
</ul>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">th_common</span></code> libraries located in <code class="docutils literal notranslate"><span class="pre">cpp\build\tensorrt_llm\thop\</span></code></p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">th_common.dll</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">th_common.exp</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">th_common.lib</span></code></p></li>
</ul>
</li>
</ul>
</li>
</ul>
<p>The locations of the DLLs, in addition to some <code class="docutils literal notranslate"><span class="pre">torch</span></code> DLLs and <code class="docutils literal notranslate"><span class="pre">TensorRT</span></code> DLLs, must be added to the Windows <code class="docutils literal notranslate"><span class="pre">Path</span></code> in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your <code class="docutils literal notranslate"><span class="pre">Path</span></code>. When complete, your <code class="docutils literal notranslate"><span class="pre">Path</span></code> should include lines similar to these:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>%USERPROFILE%<span class="se">\i</span>nference<span class="se">\T</span>ensorRT<span class="se">\l</span>ib
%USERPROFILE%<span class="se">\i</span>nference<span class="se">\T</span>ensorRT-LLM<span class="se">\c</span>pp<span class="se">\b</span>uild<span class="se">\t</span>ensorrt_llm
%USERPROFILE%<span class="se">\A</span>ppData<span class="se">\L</span>ocal<span class="se">\P</span>rograms<span class="se">\P</span>ython<span class="se">\P</span>ython310<span class="se">\L</span>ib<span class="se">\s</span>ite-packages<span class="se">\t</span>ensorrt_llm<span class="se">\l</span>ibs
%USERPROFILE%<span class="se">\A</span>ppData<span class="se">\L</span>ocal<span class="se">\P</span>rograms<span class="se">\P</span>ython<span class="se">\P</span>ython310<span class="se">\L</span>ib<span class="se">\s</span>ite-packages<span class="se">\t</span>orch<span class="se">\l</span>ib
</pre></div>
</div>
<p>Your <code class="docutils literal notranslate"><span class="pre">Path</span></code> additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.</p>
<p>Again, close and re-open any existing PowerShell or Git Bash windows so they pick up the new <code class="docutils literal notranslate"><span class="pre">Path</span></code>.</p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="windows.html" class="btn btn-neutral float-left" title="Installing on Windows" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="grace-hopper.html" class="btn btn-neutral float-right" title="Installing on Grace Hopper" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<jinja2.runtime.BlockReference object at 0x7fed9c2ebb00>
<div class="footer">
<p>
Copyright © 2024 NVIDIA Corporation
</p>
<p>
<a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank" rel="noopener"
data-cms-ai="0">Privacy Policy</a> |
<a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-center/" target="_blank" rel="noopener"
data-cms-ai="0">Manage My Privacy</a> |
<a class="Link" href="https://www.nvidia.com/en-us/preferences/start/" target="_blank" rel="noopener"
data-cms-ai="0">Do Not Sell or Share My Data</a> |
<a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/terms-of-service/" target="_blank"
rel="noopener" data-cms-ai="0">Terms of Service</a> |
<a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/accessibility/" target="_blank" rel="noopener"
data-cms-ai="0">Accessibility</a> |
<a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/company-policies/" target="_blank"
rel="noopener" data-cms-ai="0">Corporate Policies</a> |
<a class="Link" href="https://www.nvidia.com/en-us/product-security/" target="_blank" rel="noopener"
data-cms-ai="0">Product Security</a> |
<a class="Link" href="https://www.nvidia.com/en-us/contact/" target="_blank" rel="noopener"
data-cms-ai="0">Contact</a>
</p>
</div>
</div>
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>