mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
19 lines
761 B
ReStructuredText
19 lines
761 B
ReStructuredText
Dynamo K8s Example
|
|
=================================
|
|
|
|
|
|
1. Install Dynamo Cloud
|
|
|
|
Please follow `this guide <https://docs.nvidia.com/dynamo/latest/guides/dynamo_deploy/dynamo_cloud.html>`_
|
|
to install Dynamo cloud for your Kubernetes cluster.
|
|
|
|
2. Deploy the TRT-LLM Deployment
|
|
|
|
Dynamo uses custom resource definitions (CRDs) to manage the lifecycle of the
|
|
deployments. You can use the `DynamoDeploymentGraph yaml <https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm/deploy>`_
|
|
files to create aggregated, and disaggregated TRT-LLM deployments.
|
|
|
|
Please see `Deploying Dynamo Inference Graphs to Kubernetes using the Dynamo
|
|
Cloud Platform <https://docs.nvidia.com/dynamo/latest/guides/dynamo_deploy/operator_deployment.html>`_
|
|
for more details.
|