TensorRT-LLMs/installation.html
2023-12-04 18:59:41 +08:00

344 lines
25 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Build TensorRT-LLM &mdash; tensorrt_llm documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=80d5e7a1" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=19f00094" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=5929fcd5"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Performance of TensorRT-LLM" href="performance.html" />
<link rel="prev" title="Numerical Precision" href="precision.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
tensorrt_llm
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="architecture.html">TensorRT-LLM Architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="gpt_runtime.html">C++ GPT Runtime</a></li>
<li class="toctree-l1"><a class="reference internal" href="batch_manager.html">The Batch Manager in TensorRT-LLM</a></li>
<li class="toctree-l1"><a class="reference internal" href="gpt_attention.html">Multi-head, Multi-query and Group-query Attention</a></li>
<li class="toctree-l1"><a class="reference internal" href="precision.html">Numerical Precision</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Build TensorRT-LLM</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#overview">Overview</a></li>
<li class="toctree-l2"><a class="reference internal" href="#fetch-the-sources">Fetch the Sources</a></li>
<li class="toctree-l2"><a class="reference internal" href="#build-tensorrt-llm-in-one-step">Build TensorRT-LLM in One Step</a></li>
<li class="toctree-l2"><a class="reference internal" href="#build-step-by-step">Build Step-by-step</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#create-the-container">Create the Container</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#on-systems-with-gnu-make">On Systems with GNU <code class="docutils literal notranslate"><span class="pre">make</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#on-systems-without-gnu-make">On Systems Without GNU <code class="docutils literal notranslate"><span class="pre">make</span></code></a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#id1">Build TensorRT-LLM</a></li>
<li class="toctree-l3"><a class="reference internal" href="#build-the-python-bindings-for-the-c-runtime">Build the Python Bindings for the C++ Runtime</a></li>
<li class="toctree-l3"><a class="reference internal" href="#link-with-the-tensorrt-llm-c-runtime">Link with the TensorRT-LLM C++ Runtime</a></li>
<li class="toctree-l3"><a class="reference internal" href="#supported-c-header-files">Supported C++ Header Files</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="performance.html">Performance of TensorRT-LLM</a></li>
<li class="toctree-l1"><a class="reference internal" href="2023-05-19-how-to-debug.html">How to debug</a></li>
<li class="toctree-l1"><a class="reference internal" href="2023-05-17-how-to-add-a-new-model.html">How to add a new model</a></li>
<li class="toctree-l1"><a class="reference internal" href="graph-rewriting.html">Graph Rewriting Module</a></li>
<li class="toctree-l1"><a class="reference internal" href="memory.html">Memory Usage of TensorRT-LLM</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Python API</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.layers.html">Layers</a></li>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.functional.html">Functionals</a></li>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.models.html">Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.plugin.html">Plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.quantization.html">Quantization</a></li>
<li class="toctree-l1"><a class="reference internal" href="python-api/tensorrt_llm.runtime.html">Runtime</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">C++ API</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="_cpp_gen/runtime.html">Runtime</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Blogs</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="blogs/H100vsA100.html">H100 has 4.6x A100 Performance in TensorRT-LLM, achieving 10,000 tok/s at 100ms to first token</a></li>
<li class="toctree-l1"><a class="reference internal" href="blogs/H200launch.html">H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">tensorrt_llm</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Build TensorRT-LLM</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/installation.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="build-tensorrt-llm">
<h1>Build TensorRT-LLM<a class="headerlink" href="#build-tensorrt-llm" title="Link to this heading"></a></h1>
<ul class="simple">
<li><p><a class="reference internal" href="#overview"><span class="xref myst">Overview</span></a></p></li>
<li><p><a class="reference internal" href="#fetch-the-sources"><span class="xref myst">Fetch the Sources</span></a></p></li>
<li><p><a class="reference internal" href="#build-tensorrt-llm-in-one-step"><span class="xref myst">Build TensorRT-LLM in One Step</span></a></p></li>
<li><p><a class="reference internal" href="#build-step-by-step"><span class="xref myst">Build Step-by-step</span></a></p>
<ul>
<li><p><a class="reference internal" href="#create-the-container"><span class="xref myst">Create the Container</span></a></p>
<ul>
<li><p><a class="reference internal" href="#on-systems-with-gnu-make"><span class="xref myst">On Systems with GNU <code class="docutils literal notranslate"><span class="pre">make</span></code></span></a></p></li>
<li><p><a class="reference internal" href="#on-systems-without-gnu-make"><span class="xref myst">On Systems without GNU <code class="docutils literal notranslate"><span class="pre">make</span></code></span></a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="#build-tensorrt-llm"><span class="xref myst">Build TensorRT-LLM</span></a></p></li>
<li><p><a class="reference internal" href="#link-with-the-tensorrt-llm-c++-runtime"><span class="xref myst">Link with the TensorRT-LLM C++ Runtime</span></a></p></li>
<li><p><a class="reference internal" href="#supported-c++-header-files"><span class="xref myst">Supported C++ Header Files</span></a></p></li>
</ul>
</li>
</ul>
<section id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Link to this heading"></a></h2>
<p>This document contains instructions to build TensorRT-LLM from sources. TensorRT-LLM depends on the latest versions of
TensorRT and
<a class="reference external" href="https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy">Polygraphy</a>
which are distributed separately, and should be copied into this repository.</p>
<p>We recommend the use of <a class="reference external" href="https://www.docker.com">Docker</a> to build and run
TensorRT-LLM. Instructions to install an environment to run Docker containers
for the NVIDIA platform can be found
<a class="reference external" href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html">here</a>.</p>
</section>
<section id="fetch-the-sources">
<h2>Fetch the Sources<a class="headerlink" href="#fetch-the-sources" title="Link to this heading"></a></h2>
<p>The first step to build TensorRT-LLM is to fetch the sources:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># TensorRT-LLM uses git-lfs, which needs to be installed in advance.</span>
apt-get<span class="w"> </span>update<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>apt-get<span class="w"> </span>-y<span class="w"> </span>install<span class="w"> </span>git<span class="w"> </span>git-lfs
git<span class="w"> </span>clone<span class="w"> </span>https://github.com/NVIDIA/TensorRT-LLM.git
<span class="nb">cd</span><span class="w"> </span>TensorRT-LLM
git<span class="w"> </span>submodule<span class="w"> </span>update<span class="w"> </span>--init<span class="w"> </span>--recursive
git<span class="w"> </span>lfs<span class="w"> </span>install
git<span class="w"> </span>lfs<span class="w"> </span>pull
</pre></div>
</div>
</section>
<section id="build-tensorrt-llm-in-one-step">
<h2>Build TensorRT-LLM in One Step<a class="headerlink" href="#build-tensorrt-llm-in-one-step" title="Link to this heading"></a></h2>
<p>TensorRT-LLM contains a simple command to create a Docker image:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>release_build
</pre></div>
</div>
<p>It is possible to add the optional argument <code class="docutils literal notranslate"><span class="pre">CUDA_ARCHS=&quot;&lt;list</span> <span class="pre">of</span> <span class="pre">architectures</span> <span class="pre">in</span> <span class="pre">CMake</span> <span class="pre">format&gt;&quot;</span></code> to specify which architectures should be supported by
TensorRT-LLM. It restricts the supported GPU architectures but helps reduce
compilation time:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Restrict the compilation to Ada and Hopper architectures.</span>
make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>release_build<span class="w"> </span><span class="nv">CUDA_ARCHS</span><span class="o">=</span><span class="s2">&quot;89-real;90-real&quot;</span>
</pre></div>
</div>
<p>Once the image is built, the Docker container can be executed using:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>release_run
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">make</span></code> command supports the <code class="docutils literal notranslate"><span class="pre">LOCAL_USER=1</span></code> argument to switch to the local
user account instead of <code class="docutils literal notranslate"><span class="pre">root</span></code> inside the container. The examples of
TensorRT-LLM are installed in directory <code class="docutils literal notranslate"><span class="pre">/app/tensorrt_llm/examples</span></code>.</p>
</section>
<section id="build-step-by-step">
<h2>Build Step-by-step<a class="headerlink" href="#build-step-by-step" title="Link to this heading"></a></h2>
<p>For users looking for more flexibility, TensorRT-LLM has commands to create and
run a development container in which TensorRT-LLM can be built.</p>
<section id="create-the-container">
<h3>Create the Container<a class="headerlink" href="#create-the-container" title="Link to this heading"></a></h3>
<section id="on-systems-with-gnu-make">
<h4>On Systems with GNU <code class="docutils literal notranslate"><span class="pre">make</span></code><a class="headerlink" href="#on-systems-with-gnu-make" title="Link to this heading"></a></h4>
<p>The following command creates a Docker image for development:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>build
</pre></div>
</div>
<p>The image will be tagged locally with <code class="docutils literal notranslate"><span class="pre">tensorrt_llm/devel:latest</span></code>. To run the
container, use the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>run
</pre></div>
</div>
<p>For users who prefer to work with their own user account in that container
instead of <code class="docutils literal notranslate"><span class="pre">root</span></code>, the option <code class="docutils literal notranslate"><span class="pre">LOCAL_USER=1</span></code> must be added to the above command
above:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>make<span class="w"> </span>-C<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span><span class="nv">LOCAL_USER</span><span class="o">=</span><span class="m">1</span>
</pre></div>
</div>
</section>
<section id="on-systems-without-gnu-make">
<h4>On Systems Without GNU <code class="docutils literal notranslate"><span class="pre">make</span></code><a class="headerlink" href="#on-systems-without-gnu-make" title="Link to this heading"></a></h4>
<p>On systems without GNU <code class="docutils literal notranslate"><span class="pre">make</span></code> or shell support, the Docker image for
development can be built using:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>--pull<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--target<span class="w"> </span>devel<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--file<span class="w"> </span>docker/Dockerfile.multi<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--tag<span class="w"> </span>tensorrt_llm/devel:latest<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>.
</pre></div>
</div>
<p>The container can then be run using:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>run<span class="w"> </span>--rm<span class="w"> </span>-it<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ipc<span class="o">=</span>host<span class="w"> </span>--ulimit<span class="w"> </span><span class="nv">memlock</span><span class="o">=</span>-1<span class="w"> </span>--ulimit<span class="w"> </span><span class="nv">stack</span><span class="o">=</span><span class="m">67108864</span><span class="w"> </span>--gpus<span class="o">=</span>all<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--volume<span class="w"> </span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span>:/code/tensorrt_llm<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--workdir<span class="w"> </span>/code/tensorrt_llm<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>tensorrt_llm/devel:latest
</pre></div>
</div>
</section>
</section>
<section id="id1">
<h3>Build TensorRT-LLM<a class="headerlink" href="#id1" title="Link to this heading"></a></h3>
<p>Once in the container, TensorRT-LLM can be built from source using:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># To build the TensorRT-LLM code.</span>
python3<span class="w"> </span>./scripts/build_wheel.py<span class="w"> </span>--trt_root<span class="w"> </span>/usr/local/tensorrt
<span class="c1"># Deploy TensorRT-LLM in your environment.</span>
pip<span class="w"> </span>install<span class="w"> </span>./build/tensorrt_llm*.whl
</pre></div>
</div>
<p>By default, <code class="docutils literal notranslate"><span class="pre">build_wheel.py</span></code> enables incremental builds. To clean the build
directory, add the <code class="docutils literal notranslate"><span class="pre">--clean</span></code> option:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python3<span class="w"> </span>./scripts/build_wheel.py<span class="w"> </span>--clean<span class="w"> </span>--trt_root<span class="w"> </span>/usr/local/tensorrt
</pre></div>
</div>
<p>It is possible to restrict the compilation of TensorRT-LLM to specific CUDA
architectures. For that purpose, the <code class="docutils literal notranslate"><span class="pre">build_wheel.py</span></code> script accepts a
semicolon separated list of CUDA architecture as shown in the following
example:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Build TensorRT-LLM for Ampere.</span>
python3<span class="w"> </span>./scripts/build_wheel.py<span class="w"> </span>--cuda_architectures<span class="w"> </span><span class="s2">&quot;80-real;86-real&quot;</span><span class="w"> </span>--trt_root<span class="w"> </span>/usr/local/tensorrt
</pre></div>
</div>
<p>The list of supported architectures can be found in the
<a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/cpp/CMakeLists.txt"><code class="docutils literal notranslate"><span class="pre">CMakeLists.txt</span></code></a> file.</p>
</section>
<section id="build-the-python-bindings-for-the-c-runtime">
<h3>Build the Python Bindings for the C++ Runtime<a class="headerlink" href="#build-the-python-bindings-for-the-c-runtime" title="Link to this heading"></a></h3>
<p>The C++ Runtime, in particular, <a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/cpp/include/tensorrt_llm/runtime/gptSession.h"><code class="docutils literal notranslate"><span class="pre">GptSession</span></code></a> can be exposed to
Python via <a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/cpp/tensorrt_llm/pybind/bindings.cpp">bindings</a>. This is currently an opt-in feature which needs to be
explicitly activated during compilation time. The corresponding option <code class="docutils literal notranslate"><span class="pre">--python_bindings</span></code> can be specified
to <code class="docutils literal notranslate"><span class="pre">build_wheel.py</span></code> in the standard way:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python3<span class="w"> </span>./scripts/build_wheel.py<span class="w"> </span>--python_bindings<span class="w"> </span>--trt_root<span class="w"> </span>/usr/local/tensorrt
</pre></div>
</div>
<p>After installing the resulting wheel as described above, the C++ Runtime bindings will be available in
package <code class="docutils literal notranslate"><span class="pre">tensorrt_llm.bindings</span></code>. Running <code class="docutils literal notranslate"><span class="pre">help</span></code> on this package in a Python interpreter will provide on overview of the
relevant classes. The <a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/tests/bindings">associated unit tests</a> should also be consulted for understanding the API.</p>
</section>
<section id="link-with-the-tensorrt-llm-c-runtime">
<h3>Link with the TensorRT-LLM C++ Runtime<a class="headerlink" href="#link-with-the-tensorrt-llm-c-runtime" title="Link to this heading"></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">build_wheel.py</span></code> script will also compile the library containing the C++
runtime of TensorRT-LLM. If Python support and <code class="docutils literal notranslate"><span class="pre">torch</span></code> modules are not
required, the script provides the option <code class="docutils literal notranslate"><span class="pre">--cpp_only</span></code> which restricts the build
to the C++ runtime only:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python3<span class="w"> </span>./scripts/build_wheel.py<span class="w"> </span>--cuda_architectures<span class="w"> </span><span class="s2">&quot;80-real;86-real&quot;</span><span class="w"> </span>--cpp_only<span class="w"> </span>--clean
</pre></div>
</div>
<p>This is particularly useful to avoid linking problems which may be introduced
by particular versions of <code class="docutils literal notranslate"><span class="pre">torch</span></code> related to the <a class="reference external" href="https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html">dual ABI support of
GCC</a>. The
option <code class="docutils literal notranslate"><span class="pre">--clean</span></code> will remove the build directory before building. The default
build directory is <code class="docutils literal notranslate"><span class="pre">cpp/build</span></code>, which may be overridden using the option
<code class="docutils literal notranslate"><span class="pre">--build_dir</span></code>. Run <code class="docutils literal notranslate"><span class="pre">build_wheel.py</span> <span class="pre">--help</span></code> for an overview of all supported
options.</p>
<p>Clients may choose to link against the shared or the static version of the
library. These libraries can be found in the following locations:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>cpp/build/tensorrt_llm/libtensorrt_llm.so
cpp/build/tensorrt_llm/libtensorrt_llm_static.a
</pre></div>
</div>
<p>In addition, one needs to link against the library containing the LLM plugins
for TensorRT available here:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so
</pre></div>
</div>
</section>
<section id="supported-c-header-files">
<h3>Supported C++ Header Files<a class="headerlink" href="#supported-c-header-files" title="Link to this heading"></a></h3>
<p>When using TensorRT-LLM, you need to add the <code class="docutils literal notranslate"><span class="pre">cpp</span></code> and <code class="docutils literal notranslate"><span class="pre">cpp/include</span></code>
directories to the projects include paths. Only header files contained in
<code class="docutils literal notranslate"><span class="pre">cpp/include</span></code> are part of the supported API and may be directly included. Other
headers contained under <code class="docutils literal notranslate"><span class="pre">cpp</span></code> should not be included directly since they might
change in future versions.</p>
<p>For examples of how to use the C++ runtime, see the unit tests in
<a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/cpp/tests/runtime/gptSessionTest.cpp">gptSessionTest.cpp</a> and the related
<a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/tree/rel/cpp/tests/CMakeLists.txt">CMakeLists.txt</a> file.</p>
</section>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="precision.html" class="btn btn-neutral float-left" title="Numerical Precision" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="performance.html" class="btn btn-neutral float-right" title="Performance of TensorRT-LLM" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2023, NVidia.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>