mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
| _cpp_gen | ||
| _modules | ||
| _sources | ||
| _static | ||
| blogs | ||
| python-api | ||
| .nojekyll | ||
| 2023-05-17-how-to-add-a-new-model.html | ||
| 2023-05-19-how-to-debug.html | ||
| architecture.html | ||
| batch_manager.html | ||
| build_from_source.html | ||
| genindex.html | ||
| gpt_attention.html | ||
| gpt_runtime.html | ||
| graph-rewriting.html | ||
| index.html | ||
| inference_request.html | ||
| lora.html | ||
| memory.html | ||
| new_workflow.html | ||
| objects.inv | ||
| perf_best_practices.html | ||
| performance_analysis.html | ||
| performance.html | ||
| precision.html | ||
| py-modindex.html | ||
| search.html | ||
| searchindex.js | ||