mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Update gh-pages for windows part doc. (#1979)
Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>
This commit is contained in:
parent
10588d0bfe
commit
85f78df69c
@ -4724,7 +4724,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a046e6800>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e37d07880>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -10645,7 +10645,7 @@ one more than decoding draft tokens for prediction from primary head </p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08542620>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e36a87370>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
|
||||
```{note}
|
||||
The Windows release of TensorRT-LLM is currently in beta.
|
||||
We recommend checking out the [v0.10.0 tag](https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.10.0) for the most stable experience.
|
||||
We recommend checking out the [v0.11.0 tag](https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.11.0) for the most stable experience.
|
||||
```
|
||||
|
||||
**Prerequisites**
|
||||
@ -47,7 +47,7 @@ We recommend checking out the [v0.10.0 tag](https://github.com/NVIDIA/TensorRT-L
|
||||
before installing TensorRT-LLM with the following command.
|
||||
|
||||
```bash
|
||||
pip install tensorrt_llm==0.10.0 --extra-index-url https://pypi.nvidia.com
|
||||
pip install tensorrt_llm==0.11.0 --extra-index-url https://pypi.nvidia.com
|
||||
```
|
||||
|
||||
Run the following command to verify that your TensorRT-LLM installation is working properly.
|
||||
|
||||
@ -411,7 +411,7 @@ the TensorRT-LLM batch manager.</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a09129930>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22958670>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -169,7 +169,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a0912bf10>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22922650>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -486,7 +486,7 @@ is computed as:</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a091076d0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2226a290>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -378,7 +378,7 @@ one.</p></li>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a090f4f10>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21b56bf0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -349,7 +349,7 @@ techniques to optimize the underlying graph. It provides a wrapper similar to P
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a092e8430>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d222528f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -365,7 +365,7 @@ The mandatory input tensors to create a valid <code class="docutils literal notr
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a09121810>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d229342b0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -323,7 +323,7 @@ The following tensors are for a LoRA which has a <code class="docutils literal n
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a092eb700>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d229341f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -206,7 +206,7 @@ python3<span class="w"> </span>examples/summarize.py<span class="w"> </span><spa
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a091051e0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2294c3a0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -240,7 +240,7 @@ python<span class="w"> </span>../summarize.py<span class="w"> </span>--engine_di
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08927550>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22913c40>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -506,7 +506,7 @@ trtllm-build<span class="w"> </span>--checkpoint_dir<span class="w"> </span>./op
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a09121720>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22993580>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -377,7 +377,7 @@ issues and may be less efficient in terms of GPU utilization.</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a09397220>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2294dfc0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -158,7 +158,7 @@ Server</a> to easily create web-based services for LLMs. TensorRT-LLM supports m
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08941990>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d220022f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -336,7 +336,7 @@ The usage of this API looks like this:</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a089629e0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22002080>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -295,7 +295,7 @@ ISL = Input Sequence Length
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a093abd00>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22912fe0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -247,7 +247,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08ce38e0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22324ee0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -239,7 +239,7 @@ TensorRT-LLM v0.5.0, TensorRT v9.1.0.4 | H200, H100 FP8. </sub></p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08d0ca90>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2216f460>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -204,7 +204,7 @@ ISL = Input Sequence Length
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e203a0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2216f070>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -359,7 +359,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08d688b0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d2216e110>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -190,7 +190,7 @@ post-processor as <code class="docutils literal notranslate"><span class="pre">R
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a04c7b640>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22d17cd0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -3773,7 +3773,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08c552d0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e3781d390>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -364,7 +364,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e59ae0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35b33af0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -301,7 +301,7 @@ relevant classes. The associated unit tests should also be consulted for underst
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a093968f0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21e1b1f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -358,7 +358,7 @@ pip<span class="w"> </span>uninstall<span class="w"> </span>-y<span class="w"> <
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08bbf370>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22326920>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -181,7 +181,7 @@ git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08d15ab0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21ff4fd0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -135,7 +135,7 @@
|
||||
<div class="admonition note">
|
||||
<p class="admonition-title">Note</p>
|
||||
<p>The Windows release of TensorRT-LLM is currently in beta.
|
||||
We recommend checking out the <a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.10.0">v0.10.0 tag</a> for the most stable experience.</p>
|
||||
We recommend checking out the <a class="reference external" href="https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.11.0">v0.11.0 tag</a> for the most stable experience.</p>
|
||||
</div>
|
||||
<p><strong>Prerequisites</strong></p>
|
||||
<ol class="arabic">
|
||||
@ -177,7 +177,7 @@ pip<span class="w"> </span>uninstall<span class="w"> </span>-y<span class="w"> <
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>before installing TensorRT-LLM with the following command.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">tensorrt_llm</span><span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span>--extra-index-url<span class="w"> </span>https://pypi.nvidia.com
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">tensorrt_llm</span><span class="o">==</span><span class="m">0</span>.11.0<span class="w"> </span>--extra-index-url<span class="w"> </span>https://pypi.nvidia.com
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Run the following command to verify that your TensorRT-LLM installation is working properly.</p>
|
||||
@ -201,7 +201,7 @@ pip<span class="w"> </span>uninstall<span class="w"> </span>-y<span class="w"> <
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e52dd0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d221498a0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -220,7 +220,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08bc65f0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22d15e40>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -194,7 +194,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e45480>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21e1a320>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -222,7 +222,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a091f68c0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22959db0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -506,7 +506,7 @@ recommended to start from <code class="docutils literal notranslate"><span class
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e58940>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35b9d990>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -1398,7 +1398,7 @@ that can be compared with the table in the <a class="reference internal" href="#
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08902ec0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d220e4490>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a088bc640>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35c9dd20>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -167,7 +167,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a088bc6d0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35c9c4c0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a088d1c90>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35bf3f40>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a088f2410>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35b9c040>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a0976c4f0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e359ca8f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -140,7 +140,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e53eb0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35b9ee90>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -290,7 +290,7 @@ python<span class="w"> </span>/opt/scripts/launch_triton_server.py<span class="w
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08bcfbb0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d220f53f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -276,7 +276,7 @@ Here some explanations on how these values affect the memory:</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08a86f80>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21637ac0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -725,7 +725,7 @@ are:</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08a95510>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0eb89db7f0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -286,7 +286,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a089f5660>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21e09090>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -404,7 +404,7 @@ dedicated MPI environment, not the one provided by your Slurm allocation.</p>
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08e46a10>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d22326a40>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -791,7 +791,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08a84430>
|
||||
<jinja2.runtime.BlockReference object at 0x7f0d21696b90>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
@ -149,7 +149,7 @@
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08c8c9a0>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e361dcd60>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
File diff suppressed because one or more lines are too long
@ -436,7 +436,7 @@ However, similar to any new model, you can follow the same approach to define yo
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<jinja2.runtime.BlockReference object at 0x7f8a08a70490>
|
||||
<jinja2.runtime.BlockReference object at 0x7f8e35c9c6a0>
|
||||
|
||||
<div class="footer">
|
||||
<p>
|
||||
|
||||
Loading…
Reference in New Issue
Block a user