mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
16 lines
329 B
ReStructuredText
16 lines
329 B
ReStructuredText
Performance Tuning Guide
|
|
=======================
|
|
|
|
.. include:: introduction.md
|
|
:parser: myst_parser.sphinx_
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
benchmarking-default-performance
|
|
useful-build-time-flags
|
|
tuning-max-batch-size-and-max-num-tokens
|
|
deciding-model-sharding-strategy
|
|
fp8-quantization
|
|
useful-runtime-flags
|