TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

History

William Zhang ffc0f54959 [https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 ) * Why? Previously, the mrope tensors' IPC handles would just be forwarded from encode -> prefill -> decode workers. While this is fine for the prefill worker, it is not for the decode worker, since by the time it tries to rebuild those tensors, they could have been garbage collected due to their refcounts reaching zero in the producer (encode) worker. This could lead to nasty runtime errors when running E/P/D disaggregated serving. * What? This commit fixes this by having the prefill worker take ownership of those reconstructed tensors, and stand up new copies for the decode worker. Closes: NvBug 5848756 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2026-02-06 22:37:42 -05:00
..
__init__.py	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
_util.py	[https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation (#10798 )	2026-02-06 14:28:47 +08:00
config_utils.py	[None][feat] Eagle: MLA Based Eagle (#9677 )	2026-01-02 13:45:07 -05:00
cuda_graph_runner.py	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 )	2026-02-02 14:29:02 +08:00
executor_request_queue.py	[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988 )	2026-02-02 10:36:08 +08:00
finish_reason.py	[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328 )	2025-06-25 17:41:36 +02:00
grammar_matcher.py	[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend (#8717 )	2025-10-29 20:37:14 +08:00
guided_decoder.py	[None][fix] Always reset drafting states for GuidedDecoder (#10899 )	2026-02-02 16:26:46 +08:00
handle_additional_outputs.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
handle_logits.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
hang_detector.py	[None][feat] Hang detection for executor loop and worker. (#10480 )	2026-01-13 02:34:32 -05:00
kv_cache_connector.py	[None][feat] Add priority-based KV cache offload filtering support (#10751 )	2026-02-05 05:22:56 -05:00
kv_cache_transceiver.py	[TRTLLM-8921][feat] implement gen-first disagg_service (#11020 )	2026-02-03 15:46:11 -05:00
layerwise_nvtx_marker.py	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
llm_request.py	[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes (#11281 )	2026-02-05 16:33:22 +01:00
make_decoding_batch_input_output.py	[None][refactor] decoding inputs, part 2 (#5799 )	2025-11-18 14:38:51 +01:00
mamba_cache_manager.py	[#10013 ][feat] AutoDeploy: native cache manager integration (#10635 )	2026-01-27 11:23:22 -05:00
model_engine.py	[https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker (#11217 )	2026-02-06 22:37:42 -05:00
model_loader.py	[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130 )	2026-02-06 09:49:30 +08:00
py_executor_creator.py	[TRTLLM-10752][chore] set default val of max_num_tokens_in_buffer as max_seq_len or max_input_len (#11082 )	2026-02-05 14:54:00 -05:00
py_executor.py	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 )	2026-02-06 14:23:51 -05:00
request_utils.py	[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988 )	2026-02-02 10:36:08 +08:00
resource_manager.py	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 )	2026-02-06 14:23:51 -05:00
sampler.py	[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes (#11281 )	2026-02-05 16:33:22 +01:00
sampling_utils_flashinfer.py	[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) (#11276 )	2026-02-05 15:33:51 +01:00
sampling_utils.py	[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) (#11276 )	2026-02-05 15:33:51 +01:00
scheduler.py	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 )	2026-02-02 14:29:02 +08:00
seq_slot_manager.py	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 )	2025-08-15 09:52:06 -07:00