mirror of
https://github.com/vllm-project/vllm.git
synced 2026-06-06 00:16:14 +00:00
[NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
This commit is contained in:
@@ -128,7 +128,7 @@ The lease mechanism is controlled through `kv_connector_extra_config` in `--kv-t
|
||||
vllm serve <MODEL> \
|
||||
--kv-transfer-config '{
|
||||
"kv_connector": "NixlConnector",
|
||||
"kv_role": "kv_both",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_connector_extra_config": {"kv_lease_duration": 60}
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -50,7 +50,7 @@ To select a different backend, set `kv_connector_extra_config.backends` in `--kv
|
||||
vllm serve <MODEL> \
|
||||
--kv-transfer-config '{
|
||||
"kv_connector":"NixlConnector",
|
||||
"kv_role":"kv_both",
|
||||
"kv_role":"kv_producer",
|
||||
"kv_connector_extra_config":{"backends":["LIBFABRIC"]}
|
||||
}'
|
||||
```
|
||||
@@ -60,7 +60,7 @@ You can also pass JSON keys individually using dotted arguments, and you can app
|
||||
```bash
|
||||
vllm serve <MODEL> \
|
||||
--kv-transfer-config.kv_connector NixlConnector \
|
||||
--kv-transfer-config.kv_role kv_both \
|
||||
--kv-transfer-config.kv_role kv_producer \
|
||||
--kv-transfer-config.kv_connector_extra_config.backends+ LIBFABRIC
|
||||
```
|
||||
|
||||
@@ -81,7 +81,7 @@ VLLM_NIXL_SIDE_CHANNEL_PORT=5600 \
|
||||
vllm serve Qwen/Qwen3-0.6B \
|
||||
--port 8100 \
|
||||
--enforce-eager \
|
||||
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_load_failure_policy":"fail"}'
|
||||
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer","kv_load_failure_policy":"fail"}'
|
||||
```
|
||||
|
||||
### Consumer (Decoder) Configuration
|
||||
@@ -96,7 +96,7 @@ VLLM_NIXL_SIDE_CHANNEL_PORT=5601 \
|
||||
vllm serve Qwen/Qwen3-0.6B \
|
||||
--port 8200 \
|
||||
--enforce-eager \
|
||||
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_load_failure_policy":"fail"}'
|
||||
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer","kv_load_failure_policy":"fail"}'
|
||||
```
|
||||
|
||||
### Proxy Server
|
||||
@@ -212,10 +212,21 @@ sequenceDiagram
|
||||
Enable bidirectional KV transfer by setting `bidirectional_kv_xfer` in `kv_connector_extra_config` on **both** P and D instances:
|
||||
|
||||
```bash
|
||||
# Prefill instance
|
||||
vllm serve <MODEL> \
|
||||
--kv-transfer-config '{
|
||||
"kv_connector": "NixlConnector",
|
||||
"kv_role": "kv_both",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_connector_extra_config": {
|
||||
"bidirectional_kv_xfer": true
|
||||
}
|
||||
}'
|
||||
|
||||
# Decode instance
|
||||
vllm serve <MODEL> \
|
||||
--kv-transfer-config '{
|
||||
"kv_connector": "NixlConnector",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_connector_extra_config": {
|
||||
"bidirectional_kv_xfer": true
|
||||
}
|
||||
@@ -359,11 +370,10 @@ For multi-host DP deployment, only need to provide the host/port of the head ins
|
||||
|
||||
- **kv_producer**: For prefiller instances that generate KV caches
|
||||
- **kv_consumer**: For decoder instances that consume KV caches from prefiller
|
||||
- **kv_both**: Enables symmetric functionality where the connector can act as both producer and consumer. This provides flexibility for experimental setups and scenarios where the role distinction is not predetermined.
|
||||
- **kv_both** (deprecated): Previously used as a catch-all when the role was not predetermined. This value is now deprecated for NixlConnector and will be removed in a future release.
|
||||
|
||||
!!! tip
|
||||
NixlConnector currently does not distinguish `kv_role`; the actual prefiller/decoder roles are determined by the upper-level proxy (e.g., `toy_proxy_server.py` using `--prefiller-hosts` and `--decoder-hosts`).
|
||||
Therefore, `kv_role` in `--kv-transfer-config` is effectively a placeholder and does not affect NixlConnector's behavior.
|
||||
!!! warning
|
||||
`kv_role="kv_both"` is deprecated for NixlConnector. Please set `kv_role="kv_producer"` for prefill instances and `kv_role="kv_consumer"` for decode instances. See [#33702](https://github.com/vllm-project/vllm/issues/33702) for details.
|
||||
|
||||
### KV Load Failure Policy
|
||||
|
||||
|
||||
Reference in New Issue
Block a user