# Multi-node inference with Ray orchestrator TensorRT-LLM supports a prototype [Ray orchestrator](../README.md) as an alternative to MPI. The following example shows how to start a Ray cluster for multi-node inference. ## Quick Start **Prerequisite:** a container image with TensorRT-LLM preinstalled (or suitable for installing it). The examples use Slurm and [Enroot](https://github.com/NVIDIA/enroot). If you use a different setup, adapt the following scripts and commands to your multi-node environment. 1. Allocate nodes and open a shell on the head node: ```shell # e.g., 2 nodes with 8 GPUs per node >> salloc -t 240 -N 2 -p interactive >> srun --pty -p interactive bash ``` 2. Once on the head node, launch a multi-node Ray cluster: ```shell # Remember to set CONTAINER and MOUNTS env vars or variables inside the script to your path. # You can add the TensorRT-LLM installation command in this script if it is not preinstalled in your container. >> bash -e run_cluster.sh ``` 3. Enter the head container and run your TensorRT-LLM driver script Note that this step requires TensorRT-LLM to be installed in the containers on all nodes. If it isn’t, install it manually inside each node’s container. ```shell # On the head node >> sacct # Grab the Slurm step ID with Job Name "ray-head" >> srun --jobid= --overlap --pty bash >> enroot list -f # get process id >> enroot exec bash # You can change this script to a model and parallel settings effective for multi-node inference (e.g., TP8 or TP4PP4). >> python examples/ray_orchestrator/llm_inference_async_ray.py ``` ## Disclaimer The code is a prototype and subject to change. Currently, there are no guarantees regarding functionality, performance, or stability.