mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

History

Jatin Gangani 97b38ac403 [None] [doc] Update IFB performance guide & GPTOSS deployment guide (#10283 ) Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com> Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>		2025-12-25 05:52:04 -05:00
..
curated	[None] [doc] Update IFB performance guide & GPTOSS deployment guide (#10283 )	2025-12-25 05:52:04 -05:00
database	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
__init__.py	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
README.md	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00

README.md

Recommended LLM API Configuration Settings

This directory contains recommended LLM API performance settings for popular models. They can be used out-of-the-box with trtllm-serve via the --config CLI flag, or you can adjust them to your specific use case.

For model-specific deployment guides, please refer to the official documentation.