.. TensorRT-LLM documentation master file, created by sphinx-quickstart on Wed Sep 20 08:35:21 2023. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to TensorRT-LLM's documentation! ======================================== .. toctree:: :maxdepth: 1 :caption: Contents: architecture.md gpt_runtime.md batch_manager.md inference_request.md gpt_attention.md precision.md build_from_source.md performance.md 2023-05-19-how-to-debug.md 2023-05-17-how-to-add-a-new-model.md graph-rewriting.md memory.md new_workflow.md lora.md perf_best_practices.md performance_analysis.md Python API ---------- - :doc:`tensorrt_llm.layers ` - :doc:`tensorrt_llm.functional ` - :doc:`tensorrt_llm.models ` - :doc:`tensorrt_llm.plugin ` - :doc:`tensorrt_llm.quantization ` - :doc:`tensorrt_llm.runtime ` .. toctree:: :maxdepth: 2 :caption: Python API :hidden: python-api/tensorrt_llm.layers python-api/tensorrt_llm.functional python-api/tensorrt_llm.models python-api/tensorrt_llm.plugin python-api/tensorrt_llm.quantization python-api/tensorrt_llm.runtime C++ API --------- - :doc:`cpp/runtime <_cpp_gen/runtime>` .. toctree:: :maxdepth: 2 :caption: C++ API :hidden: _cpp_gen/runtime Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` Blogs ---------- .. toctree:: :maxdepth: 2 :caption: Blogs :hidden: blogs/H100vsA100.md blogs/H200launch.md blogs/Falcon180B-H200.md blogs/quantization-in-TRT-LLM.md