mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Dan Blanaru 48686bca3a open source 7f370deb0090d885d7518c2b146399ba3933c004 (#2273 ) * Update TensorRT-LLM --------- Co-authored-by: Qingquan Song <ustcsqq@gmail.com>		2024-09-30 13:51:19 +02:00
..
docker	Update TensorRT-LLM (#1598 )	2024-05-14 16:43:41 +08:00
examples/llama	Update TensorRT-LLM (#1954 )	2024-07-16 15:30:25 +08:00
destruct_env.ps1	Update TensorRT-LLM (#1554 )	2024-05-07 23:34:28 +08:00
README.md	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00
setup_build_env.ps1	open source 7f370deb0090d885d7518c2b146399ba3933c004 (#2273 )	2024-09-30 13:51:19 +02:00
setup_env.ps1	open source 7f370deb0090d885d7518c2b146399ba3933c004 (#2273 )	2024-09-30 13:51:19 +02:00

README.md

TensorRT-LLM for Windows

  The Windows release of TensorRT-LLM is currently in beta.
  We recommend checking out the [v0.10.0 tag](https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.10.0) for the most stable experience.

TensorRT-LLM is supported on bare-metal Windows for single-GPU inference. The release supports GeForce 40-series GPUs.

The release wheel for Windows can be installed with pip. Alternatively, you can build TensorRT-LLM for Windows from the source. Building from the source is an advanced option and is not necessary for building or running LLM engines. It is, however, required if you plan to use the C++ runtime directly or run C++ benchmarks.

Getting Started

To get started with TensorRT-LLM on Windows, visit our documentation: