mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* add best perf practice on DSR1 Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * add ds-r1 min latency tech blog Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * rm redundant doc Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * refine table content Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * refine table content Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * relative path for images Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * refine precommit Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * pr4280 is merged Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> --------- Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
155 lines
2.9 KiB
ReStructuredText
155 lines
2.9 KiB
ReStructuredText
.. TensorRT-LLM documentation master file, created by
|
|
sphinx-quickstart on Wed Sep 20 08:35:21 2023.
|
|
You can adapt this file completely to your liking, but it should at least
|
|
contain the root `toctree` directive.
|
|
|
|
Welcome to TensorRT-LLM's Documentation!
|
|
========================================
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Getting Started
|
|
:name: Getting Started
|
|
|
|
overview.md
|
|
quick-start-guide.md
|
|
key-features.md
|
|
torch.md
|
|
release-notes.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Installation
|
|
:name: Installation
|
|
|
|
.. installation/overview.md
|
|
|
|
installation/linux.md
|
|
installation/build-from-source-linux.md
|
|
installation/grace-hopper.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: LLM API
|
|
:hidden:
|
|
:glob:
|
|
|
|
llm-api/*
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Examples
|
|
:hidden:
|
|
|
|
examples/index.rst
|
|
examples/customization.md
|
|
examples/llm_api_examples
|
|
examples/trtllm_serve_examples
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Model Definition API
|
|
:hidden:
|
|
|
|
python-api/tensorrt_llm.layers.rst
|
|
python-api/tensorrt_llm.functional.rst
|
|
python-api/tensorrt_llm.models.rst
|
|
python-api/tensorrt_llm.plugin.rst
|
|
python-api/tensorrt_llm.quantization.rst
|
|
python-api/tensorrt_llm.runtime.rst
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: C++ API
|
|
:hidden:
|
|
|
|
_cpp_gen/executor.rst
|
|
_cpp_gen/runtime.rst
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Command-Line Reference
|
|
:hidden:
|
|
|
|
commands/trtllm-build
|
|
commands/trtllm-serve
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Architecture
|
|
:name: Architecture
|
|
|
|
architecture/overview.md
|
|
architecture/core-concepts.md
|
|
architecture/checkpoint.md
|
|
architecture/workflow.md
|
|
architecture/add-model.md
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Advanced
|
|
:name: Advanced
|
|
|
|
advanced/gpt-attention.md
|
|
advanced/gpt-runtime.md
|
|
advanced/executor.md
|
|
advanced/graph-rewriting.md
|
|
advanced/inference-request.md
|
|
advanced/lora.md
|
|
advanced/expert-parallelism.md
|
|
advanced/kv-cache-reuse.md
|
|
advanced/speculative-decoding.md
|
|
advanced/disaggregated-service.md
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Performance
|
|
:name: Performance
|
|
|
|
performance/perf-overview.md
|
|
Benchmarking <performance/perf-benchmarking.md>
|
|
performance/performance-tuning-guide/index
|
|
performance/perf-analysis.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Reference
|
|
:name: Reference
|
|
|
|
reference/troubleshooting.md
|
|
reference/support-matrix.md
|
|
|
|
.. reference/upgrading.md
|
|
|
|
reference/precision.md
|
|
reference/memory.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Blogs
|
|
:hidden:
|
|
|
|
blogs/H100vsA100.md
|
|
blogs/H200launch.md
|
|
blogs/Falcon180B-H200.md
|
|
blogs/quantization-in-TRT-LLM.md
|
|
blogs/XQA-kernel.md
|
|
blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
|
|
|
|
|
|
Indices and tables
|
|
==================
|
|
|
|
* :ref:`genindex`
|
|
* :ref:`modindex`
|
|
* :ref:`search`
|