TensorRT-LLMs/tensorrt_llm
William Zhang ffc0f54959
[https://nvbugs/5848756][fix] Re-take ownership of mrope tensors in prefill worker (#11217)
* Why?

Previously, the mrope tensors' IPC handles would just be forwarded from
encode -> prefill -> decode workers. While this is fine for the
prefill worker, it is not for the decode worker, since by the time it
tries to rebuild those tensors, they could have been garbage collected
due to their refcounts reaching zero in the producer (encode) worker.

This could lead to nasty runtime errors when running E/P/D
disaggregated serving.

* What?

This commit fixes this by having the prefill worker take ownership of
those reconstructed tensors, and stand up new copies for the decode
worker.

Closes: NvBug 5848756

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-06 22:37:42 -05:00
..
_tensorrt_engine [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
_torch [https://nvbugs/5848756][fix] Re-take ownership of mrope tensors in prefill worker (#11217) 2026-02-06 22:37:42 -05:00
bench [None][chore] Print correct backend name in benchmark report (#10597) 2026-01-12 14:46:00 -05:00
commands [None][feat] Add gRPC server for high-performance external router integration (#11037) 2026-01-30 07:48:27 +08:00
evaluate [None][feat] Support to export data in trtllm-eval (#10075) 2026-01-15 23:27:08 +08:00
executor [https://nvbugs/5848756][fix] Re-take ownership of mrope tensors in prefill worker (#11217) 2026-02-06 22:37:42 -05:00
grpc [#11037][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests (#11292) 2026-02-05 05:00:29 -05:00
inputs [TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715) 2026-01-14 10:31:03 +01:00
layers [None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 (#9961) 2025-12-27 11:50:59 -08:00
llmapi [TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130) 2026-02-06 09:49:30 +08:00
metrics [None][feat] Add trtllm_ prefix for exposed metrics (#8845) 2025-11-06 15:27:18 +08:00
models [TRTLLM-9465][fix] Swap TP-CP grouping order (#10350) 2026-01-05 20:08:03 +08:00
plugin [https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499) 2026-01-13 17:16:22 +08:00
quantization [None][chore] docs: clarify LoRA is not supported with --use_fp8_rowwise in Fp8RowwiseAttention (see #2603) (#10320) 2026-01-19 04:38:00 -05:00
runtime [None][feat] Enhance support for complex models (#11254) 2026-02-05 17:28:26 +08:00
scaffolding [None][feat] Deep Research Implemented with Scaffolding (#8452) 2025-11-06 10:33:28 +08:00
serve [None][fix] make health_generate work with beam search (#11097) 2026-02-04 09:46:19 +01:00
tokenizer [https://nvbugs/5684820][fix] fix the detokenizer issue for DeepSeek-v3.2 (#10106) 2025-12-22 10:56:33 +08:00
tools [None][feat] Add performance alignment to layer-wise benchmarks (#11018) 2026-01-29 14:01:51 +08:00
__init__.py [https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471) 2026-01-28 19:56:32 -08:00
_common.py [None][feat] Hang detection for executor loop and worker. (#10480) 2026-01-13 02:34:32 -05:00
_dlpack_utils.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
_ipc_utils.py [None][refactor] Unify the usage of MPIDist and TorchDist. (#10380) 2026-01-14 14:05:47 +08:00
_mnnvl_utils.py [https://nvbugs/5791900][fix] Fix HelixCpMnnvlMemory init with PP (#10533) 2026-01-13 15:48:42 -05:00
_ray_utils.py [TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302) 2025-11-04 10:19:24 -08:00
_utils.py [TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659) 2026-02-02 14:29:02 +08:00
builder.py [TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330) 2025-10-28 09:17:26 -07:00
disaggregated_params.py [TRTLLM-8921][feat] implement gen-first disagg_service (#11020) 2026-02-03 15:46:11 -05:00
functional.py [#8921][feat] Added symetric memory AllReduce strategy (#8919) 2025-12-08 13:12:56 -08:00
graph_rewriting.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
logger.py [None][chore] Mass integration of release/1.0 - 3rd (#7519) 2025-09-08 14:03:04 +08:00
lora_helper.py [TRTLLM-8682][chore] Remove auto_parallel module (#8329) 2025-10-22 20:53:08 -04:00
lora_manager.py [https://nvbugs/5510879][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063) 2025-10-12 12:29:52 -07:00
mapping.py [None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905) 2026-01-15 07:29:15 +08:00
math_utils.py perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318) 2025-06-26 14:03:56 +08:00
module.py [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851) 2025-09-25 21:02:35 +08:00
network.py [TRTLLM-8682][chore] Remove auto_parallel module (#8329) 2025-10-22 20:53:08 -04:00
parameter.py fix:https://nvbugs/5234033 enable starcoder trt-flow with transforme… (#3909) 2025-05-15 11:16:45 +08:00
profiler.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
prompt_adapter_manager.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
python_plugin.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
ray_stub.py [TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175) 2025-10-14 23:46:30 +08:00
sampling_params.py [TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675) 2026-01-16 10:52:41 -08:00
scheduling_params.py [None][feat] Add support of scheduling attention dp request (#6246) 2025-08-01 20:38:01 -04:00
serialization.py [https://nvbugs/5775021] [fix] Replace pickle.load with restricted Unpickler (#10622) 2026-01-21 11:42:54 +08:00
top_model_mixin.py [TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277) 2025-10-17 16:13:22 -04:00
version.py [None][chore] bump version to 1.3.0rc3 (#11238) 2026-02-04 09:30:45 +08:00