mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (#9571)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>
This commit is contained in:
parent
0406949f32
commit
d4f68195c3
@ -31,6 +31,8 @@ Ensure your GPU supports FP8 quantization before running the following:
|
||||
trtllm-serve "nvidia/Qwen3-8B-FP8"
|
||||
```
|
||||
|
||||
For more options, browse the full [collection of generative models](https://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer) that have been quantized and optimized for inference with the TensorRT Model Optimizer.
|
||||
|
||||
```{note}
|
||||
If you are running `trtllm-serve` inside a Docker container, you have two options for sending API requests:
|
||||
1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user