From c23369248581e56a955566801b7166302b0e12ad Mon Sep 17 00:00:00 2001
From: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Date: Tue, 10 Feb 2026 15:31:45 +0800
Subject: [PATCH] [None][doc] add multiple-instances section in disaggregated
 serving doc (#11412)

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
---
 docs/source/features/disagg-serving.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/docs/source/features/disagg-serving.md b/docs/source/features/disagg-serving.md
index 88feb11b08..6e600793fb 100644
--- a/docs/source/features/disagg-serving.md
+++ b/docs/source/features/disagg-serving.md
@@ -10,6 +10,7 @@
 - [Usage](#Usage)
   - [Dynamo](#Dynamo)
   - [trtllm-serve](#trtllm-serve)
+  - [Multiple Instances](#multiple-instances)
 - [Environment Variables](#Environment-Variables)
 - [Troubleshooting and FAQ](#Troubleshooting-and-FAQ)
 
@@ -215,6 +216,23 @@ curl http://localhost:8000/v1/completions \
 
 Please refer to [Disaggregated Inference Benchmark Scripts](../../../examples/disaggregated/slurm).
 
+### Multiple Instances
+
+To increase maximum concurrency without more GPU nodes, you can deploy multiple disaggregated server instances across different nodes, while each instance manages the same context/generation servers. This is helpful when one disaggregated server becomes a performance bottleneck or runs out of ephemeral ports.
+
+Example (two-node deployment):
+
+- **Node A**
+  - Context servers: `node-a:8001`
+  - Generation servers: `node-b:8002`
+  - Disaggregated orchestrator endpoint: `node-a:8000`
+- **Node B**
+  - Context servers: `node-a:8001`
+  - Generation servers: `node-b:8002`
+  - Disaggregated orchestrator endpoint: `node-b:8000`
+- **Client entrypoint**
+  - Send requests or use a load balancer forwarding to `node-a:8000` and `node-b:8000`
+
 ## Environment Variables
 
 TRT-LLM uses some environment variables to control the behavior of disaggregated service.