[Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-06 00:16:14 +00:00 · 2026-06-01 12:37:30 -07:00
parent 182c67daf1
commit 266b9d9c64
12 changed files with 1423 additions and 75 deletions
@@ -84,7 +84,10 @@ Both the trainer (`NCCLTrainerSendWeightsArgs`) and inference side (`NCCLWeightT

 ## Receiving Weights (Inference Side)

-The inference side triggers weight reception using the four-phase protocol — `init_weight_transfer_engine`, `start_weight_update`, `update_weights`, `finish_weight_update`. The init phase is shown [above](#initialization). The remaining three steps are:
+The inference side triggers weight reception using the four-phase protocol:
+`init_weight_transfer_engine`, `start_weight_update`, `update_weights`,
+`finish_weight_update`. The init phase is shown [above](#initialization). The
+remaining three steps are:

 ```python
 from vllm.distributed.weight_transfer.base import WeightTransferUpdateRequest
@@ -108,12 +111,24 @@ llm.update_weights(
 llm.finish_weight_update()
 ```

-The `names`, `dtype_names`, and `shapes` lists describe each parameter. These must match the order in which the trainer iterates over its parameters.
+The `names`, `dtype_names`, and `shapes` lists describe each parameter. These
+must match the order in which the trainer iterates over its parameters.

-`start_weight_update` must be called before `update_weights`, and `finish_weight_update` must be called after all weight chunks have been transferred. The `is_checkpoint_format` flag controls whether layerwise reload processing is applied (`True` for checkpoint-format weights, `False` for pre-processed kernel-format weights).
+`start_weight_update` must be called before `update_weights`, and
+`finish_weight_update` must be called after all weight chunks have been
+transferred. The `is_checkpoint_format` flag controls whether layerwise reload
+processing is applied (`True` for checkpoint-format weights, `False` for
+pre-processed kernel-format weights).
+
+Sparse NCCL patches still use `update_kind="sparse_flat"` inside
+`update_info`, but they should be wrapped in
+`start_weight_update(is_checkpoint_format=False)` because sparse patches apply
+directly to runtime/kernel-format parameters. The current sparse MVP requires
+`TP=1` and `PP=1`.

 ## Examples

 - [RLHF with NCCL weight syncing (offline, Ray)](../../../examples/rl/rlhf_nccl.py) - Trainer on one GPU, 2x tensor-parallel vLLM engine on two others, with packed NCCL weight broadcast
+- [RLHF with sparse NCCL weight syncing (offline, Ray)](../../../examples/rl/rlhf_sparse_nccl.py) - Dense-vs-sparse equivalence demo with a real model on a 2-GPU trainer/inference setup; sparse patches use `start_weight_update(is_checkpoint_format=False)` and currently require `TP=1` and `PP=1`
 - [RLHF with async weight syncing (offline, Ray)](../../../examples/rl/rlhf_async_new_apis.py) - Async generation with mid-flight pause, weight sync, resume, and validation against a fresh model
 - [RLHF with NCCL weight syncing (online serving, HTTP)](../../../examples/rl/rlhf_http_nccl.py) - Weight transfer with a running vLLM HTTP server using HTTP control plane and NCCL data plane