TensorRT-LLMs/tensorrt_llm/_torch
William Zhang ffc0f54959
[https://nvbugs/5848756][fix] Re-take ownership of mrope tensors in prefill worker (#11217)
* Why?

Previously, the mrope tensors' IPC handles would just be forwarded from
encode -> prefill -> decode workers. While this is fine for the
prefill worker, it is not for the decode worker, since by the time it
tries to rebuild those tensors, they could have been garbage collected
due to their refcounts reaching zero in the producer (encode) worker.

This could lead to nasty runtime errors when running E/P/D
disaggregated serving.

* What?

This commit fixes this by having the prefill worker take ownership of
those reconstructed tensors, and stand up new copies for the decode
worker.

Closes: NvBug 5848756

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-06 22:37:42 -05:00
..
attention_backend [https://nvbugs/5624818][fix] Work around accuracy issue by enforcing paged_context_fmha on Hopper for fmha_v2 (#11192) 2026-02-04 19:21:50 +08:00
auto_deploy [None][feat] AutoDeploy: add triton backend for causal conv (#11124) 2026-02-05 21:33:00 -08:00
compilation [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
configs [TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405) 2025-10-24 13:40:41 -04:00
cuda_tile_kernels [None][feat] Integrate cuda.tile RMS norm kernels (#9725) 2026-02-02 19:44:27 +08:00
custom_ops [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
cute_dsl_kernels [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
debug Add debug hook to support dump tensor data and add new debug functions easily (#5182) 2025-06-24 17:45:28 +08:00
disaggregation [TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225) 2026-02-06 07:15:18 -05:00
distributed [TRTLLM-10264][feat] Support attention DP + Helix CP (#10477) 2026-01-29 02:57:13 -05:00
models [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
modules [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
peft [https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279) 2026-01-22 14:01:18 +01:00
pyexecutor [https://nvbugs/5848756][fix] Re-take ownership of mrope tensors in prefill worker (#11217) 2026-02-06 22:37:42 -05:00
shared_tensor [1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396) 2025-07-10 05:12:53 +09:00
speculative [https://nvbugs/5756028][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation (#10798) 2026-02-06 14:28:47 +08:00
__init__.py [TRTLLM-9212][chore] move MoeLoadBalancerConfig to llm_args.py (#9002) 2025-11-13 10:47:35 +08:00
async_llm.py [TRTLLM-9736][feat] AsyncLLM and verl integ (#9353) 2025-12-11 09:33:25 -08:00
autotuner.py [TRTLLM-10264][feat] Support attention DP + Helix CP (#10477) 2026-01-29 02:57:13 -05:00
cublaslt_utils.py [https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943) 2025-10-23 15:55:10 +08:00
cuda_tile_utils.py [None][feat] Integrate cuda.tile RMS norm kernels (#9725) 2026-02-02 19:44:27 +08:00
cute_dsl_utils.py [None][chore] polish error message in cute_dsl_utils.py (#7852) 2025-09-19 12:05:11 +08:00
device_mesh.py [TRTLLM-9465][fix] Swap TP-CP grouping order (#10350) 2026-01-05 20:08:03 +08:00
expert_statistic.py [TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587) 2025-11-17 18:07:13 +01:00
flashinfer_utils.py [TRTLLM-9578][feat] make PDL enabled by default (#9695) 2025-12-25 07:15:24 -05:00
hostfunc.py [TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948) 2025-09-03 15:16:11 -07:00
llm.py [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
memory_buffer_utils.py [https://nvbugs/5811697][fix] Fix buffer reuse. (#10716) 2026-01-25 18:12:21 +08:00
metadata.py [None][feat] Use Separate QKV Input Layout for Context MLA (#6538) 2025-08-19 22:04:48 +08:00
model_config.py [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
utils.py [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
virtual_memory.py [TRTLLM-9736][feat] AsyncLLM and verl integ (#9353) 2025-12-11 09:33:25 -08:00