mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-17 00:04:57 +08:00
* Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| rpc | ||
| __init__.py | ||
| base_worker.py | ||
| executor.py | ||
| ipc.py | ||
| postproc_worker.py | ||
| proxy.py | ||
| ray_executor.py | ||
| ray_gpu_worker.py | ||
| request.py | ||
| result.py | ||
| rpc_proxy_mixin.py | ||
| rpc_proxy.py | ||
| rpc_worker_mixin.py | ||
| rpc_worker.py | ||
| utils.py | ||
| worker.py | ||