TensorRT-LLMs/docs/source/deployment-guide/note_sections.rst
Venky fd1270b9ab
[TRTC-43] [feat] Add config db and docs (#9420)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-12-12 04:00:03 +08:00

37 lines
1.3 KiB
ReStructuredText

..
Reusable note sections for deployment guides.
Include specific notes using:
.. include:: note_sections.rst
:start-after: .. start-note-<name>
:end-before: .. end-note-<name>
.. start-note-traffic-patterns
.. note::
**Traffic Patterns**: The ISL (Input Sequence Length) and OSL (Output Sequence Length)
values in each configuration represent the **maximum supported values** for that config.
Requests exceeding these limits may result in errors.
To handle requests with input sequences **longer than the configured ISL**, add the following
to your config file:
.. code-block:: yaml
enable_chunked_prefill: true
This enables chunked prefill, which processes long input sequences in chunks rather than
requiring them to fit within a single prefill operation. Note that enabling chunked prefill
does **not** guarantee optimal performance—these configs are tuned for the specified ISL/OSL.
.. end-note-traffic-patterns
.. start-note-quick-start-isl-osl
.. note::
The configs here are specifically optimized for a target ISL/OSL (Input/Output Sequence Length) of 1024/1024. If your traffic pattern is different, refer to the :ref:`Comprehensive Configuration Database` section below which covers a larger set of traffic patterns and performance profiles.
.. end-note-quick-start-isl-osl