.. TensorRT-LLM documentation master file, created by sphinx-quickstart on Wed Sep 20 08:35:21 2023. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to TensorRT-LLM's Documentation! ======================================== .. toctree:: :maxdepth: 2 :caption: Getting Started :name: Getting Started overview.md quick-start-guide.md installation/index.rst .. toctree:: :maxdepth: 2 :caption: Deployment Guide :name: Deployment Guide examples/llm_api_examples.rst examples/trtllm_serve_examples examples/dynamo_k8s_example.rst deployment-guide/index.rst .. toctree:: :maxdepth: 2 :caption: Models :name: Models models/supported-models.md models/adding-new-model.md .. toctree:: :maxdepth: 2 :caption: CLI Reference :name: CLI Reference commands/trtllm-bench commands/trtllm-eval commands/trtllm-serve/index .. toctree:: :maxdepth: 2 :caption: API Reference llm-api/index.md llm-api/reference.rst .. toctree:: :maxdepth: 2 :caption: Features features/feature-combination-matrix.md features/attention.md features/disagg-serving.md features/kvcache.md features/long-sequence.md features/lora.md features/multi-modality.md features/overlap-scheduler.md features/paged-attention-ifb-scheduler.md features/parallel-strategy.md features/quantization.md features/sampling.md features/speculative-decoding.md features/checkpoint-loading.md features/auto_deploy/auto-deploy.md .. toctree:: :maxdepth: 2 :caption: Developer Guide architecture/overview.md developer-guide/perf-analysis.md developer-guide/perf-benchmarking.md developer-guide/ci-overview.md developer-guide/dev-containers.md .. .. toctree:: .. :maxdepth: 2 .. :caption: Architecture .. :name: Architecture .. architecture/overview.md .. architecture/core-concepts.md .. architecture/checkpoint.md .. architecture/workflow.md .. architecture/add-model.md .. .. toctree:: .. :maxdepth: 2 .. :caption: Advanced .. :name: Advanced .. advanced/gpt-attention.md .. advanced/gpt-runtime.md .. advanced/executor.md .. advanced/graph-rewriting.md .. advanced/inference-request.md .. advanced/lora.md .. advanced/expert-parallelism.md .. advanced/kv-cache-management.md .. advanced/kv-cache-reuse.md .. advanced/speculative-decoding.md .. advanced/disaggregated-service.md .. .. toctree:: .. :maxdepth: 2 .. :caption: Performance .. :name: Performance .. performance/perf-overview.md .. Benchmarking .. performance/performance-tuning-guide/index .. performance/perf-analysis.md .. .. toctree:: .. :maxdepth: 2 .. :caption: Reference .. :name: Reference .. reference/troubleshooting.md .. reference/support-matrix.md .. .. reference/upgrading.md .. reference/precision.md .. reference/memory.md .. toctree:: :maxdepth: 2 :caption: Blogs :glob: blogs/tech_blog/* blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md blogs/H200launch.md blogs/XQA-kernel.md blogs/H100vsA100.md .. toctree:: :maxdepth: 2 :caption: Quick Links Releases Github Code Roadmap .. toctree:: :maxdepth: 2 :caption: Use TensorRT Engine :hidden: legacy/tensorrt_quickstart.md Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`