mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
|
|
||
|---|---|---|
| _cpp_gen | ||
| _downloads/ea8faa5e98124e92f96b66dc586fb429 | ||
| _sources | ||
| _static | ||
| advanced | ||
| architecture | ||
| blogs | ||
| installation | ||
| performance | ||
| python-api | ||
| reference | ||
| .nojekyll | ||
| executor.html | ||
| genindex.html | ||
| index.html | ||
| kv_cache_reuse.html | ||
| objects.inv | ||
| overview.html | ||
| quick-start-guide.html | ||
| release-notes.html | ||
| search.html | ||
| searchindex.js | ||
| speculative_decoding.html | ||