[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (#9571)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
This commit is contained in:
QI JUN 2025-12-01 10:51:31 +08:00 committed by Mike Iovine
parent 0406949f32
commit d4f68195c3

View File

@ -31,6 +31,8 @@ Ensure your GPU supports FP8 quantization before running the following:
trtllm-serve "nvidia/Qwen3-8B-FP8"
```
For more options, browse the full [collection of generative models](https://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer) that have been quantized and optimized for inference with the TensorRT Model Optimizer.
```{note}
If you are running `trtllm-serve` inside a Docker container, you have two options for sending API requests:
1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.