TensorRT-LLMs/docs/source/index.rst
amirkl94 fbec0c3552
Release 0.20 to main (#4577)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00

157 lines
3.0 KiB
ReStructuredText

.. TensorRT-LLM documentation master file, created by
sphinx-quickstart on Wed Sep 20 08:35:21 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to TensorRT-LLM's Documentation!
========================================
.. toctree::
:maxdepth: 2
:caption: Getting Started
:name: Getting Started
overview.md
quick-start-guide.md
key-features.md
torch.md
release-notes.md
.. toctree::
:maxdepth: 2
:caption: Installation
:name: Installation
.. installation/overview.md
installation/linux.md
installation/build-from-source-linux.md
installation/grace-hopper.md
.. toctree::
:maxdepth: 2
:caption: LLM API
:hidden:
:glob:
llm-api/*
.. toctree::
:maxdepth: 2
:caption: Examples
:hidden:
examples/index.rst
examples/customization.md
examples/llm_api_examples
examples/trtllm_serve_examples
.. toctree::
:maxdepth: 2
:caption: Model Definition API
:hidden:
python-api/tensorrt_llm.layers.rst
python-api/tensorrt_llm.functional.rst
python-api/tensorrt_llm.models.rst
python-api/tensorrt_llm.plugin.rst
python-api/tensorrt_llm.quantization.rst
python-api/tensorrt_llm.runtime.rst
.. toctree::
:maxdepth: 2
:caption: C++ API
:hidden:
_cpp_gen/executor.rst
_cpp_gen/runtime.rst
.. toctree::
:maxdepth: 2
:caption: Command-Line Reference
:hidden:
commands/trtllm-build
commands/trtllm-serve
.. toctree::
:maxdepth: 2
:caption: Architecture
:name: Architecture
architecture/overview.md
architecture/core-concepts.md
architecture/checkpoint.md
architecture/workflow.md
architecture/add-model.md
.. toctree::
:maxdepth: 2
:caption: Advanced
:name: Advanced
advanced/gpt-attention.md
advanced/gpt-runtime.md
advanced/executor.md
advanced/graph-rewriting.md
advanced/inference-request.md
advanced/lora.md
advanced/expert-parallelism.md
advanced/kv-cache-management.md
advanced/kv-cache-reuse.md
advanced/speculative-decoding.md
advanced/disaggregated-service.md
.. toctree::
:maxdepth: 2
:caption: Performance
:name: Performance
performance/perf-overview.md
Benchmarking <performance/perf-benchmarking.md>
performance/performance-tuning-guide/index
performance/perf-analysis.md
.. toctree::
:maxdepth: 2
:caption: Reference
:name: Reference
reference/troubleshooting.md
reference/support-matrix.md
.. reference/upgrading.md
reference/precision.md
reference/memory.md
.. toctree::
:maxdepth: 2
:caption: Blogs
:hidden:
blogs/H100vsA100.md
blogs/H200launch.md
blogs/Falcon180B-H200.md
blogs/quantization-in-TRT-LLM.md
blogs/XQA-kernel.md
blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
blogs/tech_blog/blog2_DeepSeek_R1_MTP_Implementation_and_Optimization.md
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`