mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
127 lines
2.7 KiB
ReStructuredText
127 lines
2.7 KiB
ReStructuredText
.. TensorRT LLM documentation master file, created by
|
|
sphinx-quickstart on Wed Sep 20 08:35:21 2023.
|
|
You can adapt this file completely to your liking, but it should at least
|
|
contain the root `toctree` directive.
|
|
|
|
Welcome to TensorRT LLM's Documentation!
|
|
========================================
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Getting Started
|
|
:name: Getting Started
|
|
|
|
overview.md
|
|
quick-start-guide.md
|
|
installation/index.rst
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Deployment Guide
|
|
:name: Deployment Guide
|
|
|
|
examples/llm_api_examples.rst
|
|
examples/trtllm_serve_examples
|
|
examples/dynamo_k8s_example.rst
|
|
deployment-guide/index.rst
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Models
|
|
:name: Models
|
|
|
|
models/supported-models.md
|
|
models/adding-new-model.md
|
|
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: CLI Reference
|
|
:name: CLI Reference
|
|
|
|
commands/trtllm-bench
|
|
commands/trtllm-eval
|
|
commands/trtllm-serve/index
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: API Reference
|
|
|
|
llm-api/index.md
|
|
llm-api/reference.rst
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Features
|
|
|
|
features/feature-combination-matrix.md
|
|
features/attention.md
|
|
features/disagg-serving.md
|
|
features/kvcache.md
|
|
features/long-sequence.md
|
|
features/lora.md
|
|
features/multi-modality.md
|
|
features/overlap-scheduler.md
|
|
features/paged-attention-ifb-scheduler.md
|
|
features/parallel-strategy.md
|
|
features/quantization.md
|
|
features/sampling.md
|
|
features/additional-outputs.md
|
|
features/speculative-decoding.md
|
|
features/checkpoint-loading.md
|
|
features/auto_deploy/auto-deploy.md
|
|
features/ray-orchestrator.md
|
|
features/torch_compile_and_piecewise_cuda_graph.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Developer Guide
|
|
|
|
developer-guide/overview.md
|
|
developer-guide/perf-analysis.md
|
|
developer-guide/perf-benchmarking.md
|
|
developer-guide/ci-overview.md
|
|
developer-guide/dev-containers.md
|
|
developer-guide/api-change.md
|
|
developer-guide/kv-transfer.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Blogs
|
|
:glob:
|
|
|
|
blogs/tech_blog/*
|
|
blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md
|
|
blogs/H200launch.md
|
|
blogs/XQA-kernel.md
|
|
blogs/H100vsA100.md
|
|
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Quick Links
|
|
|
|
Releases <https://github.com/NVIDIA/TensorRT-LLM/releases>
|
|
Github Code <https://github.com/NVIDIA/TensorRT-LLM>
|
|
Roadmap <https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue%20state%3Aopen%20label%3Aroadmap>
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: Use TensorRT Engine
|
|
:hidden:
|
|
|
|
legacy/tensorrt_quickstart.md
|
|
|
|
Indices and tables
|
|
==================
|
|
|
|
* :ref:`genindex`
|
|
* :ref:`modindex`
|
|
* :ref:`search`
|