[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (#9571)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-13 22:18:36 +08:00 · 2025-12-01 10:51:31 +08:00 · 2025-12-01 10:51:31 +08:00 · d4f68195c3
commit d4f68195c3
parent 0406949f32
1 changed files with 2 additions and 0 deletions
--- a/docs/source/quick-start-guide.md
+++ b/docs/source/quick-start-guide.md
@ -31,6 +31,8 @@ Ensure your GPU supports FP8 quantization before running the following:
 trtllm-serve "nvidia/Qwen3-8B-FP8"
 ```

+For more options, browse the full [collection of generative models](https://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer) that have been quantized and optimized for inference with the TensorRT Model Optimizer.
+
 ```{note}
 If you are running `trtllm-serve` inside a Docker container, you have two options for sending API requests:
 1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.