TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-17 08:15:10 +08:00

History

William Zhang ffc0f54959 [https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 ) * Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2026-02-06 22:37:42 -05:00
..
test_external_embedding.py	[None][fix] InputProcessor config naming convention fix (#8705 )	2025-11-03 22:29:21 -08:00
test_find_num_image_tokens.py	[None][fix] InputProcessor config naming convention fix (#8705 )	2025-11-03 22:29:21 -08:00
test_fuse_input_embeds.py	[TRTLLM-7440][fix] Split `fused_input_embed` to separate out host sync (#7280 )	2025-09-06 23:11:39 -04:00
test_mm_encoder_standalone.py	[https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 )	2026-02-06 22:37:42 -05:00
test_multimodal_runtime.py	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 )	2025-09-14 20:10:10 -07:00
test_share_multiparams.py	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 )	2025-09-22 03:40:02 -07:00