TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

William Zhang a6a88985cf [TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-12-22 06:32:49 -05:00
..
__init__.py	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
_util.py	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 )	2025-12-09 18:51:31 -05:00
config_utils.py	[None][feat] Support Mistral Large3 LLM part (#9820 )	2025-12-13 11:44:27 +08:00
cuda_graph_runner.py	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 )	2025-12-15 20:05:20 -08:00
executor_request_queue.py	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
finish_reason.py	[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328 )	2025-06-25 17:41:36 +02:00
grammar_matcher.py	[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend (#8717 )	2025-10-29 20:37:14 +08:00
guided_decoder.py	[None][feat] Graceful Error Handling for Guided Decoder (#9078 )	2025-12-13 19:57:59 +08:00
handle_additional_outputs.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
handle_logits.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
kv_cache_connector.py	[None][feat] Support KV Connector with Disagg Prefill Worker (#8246 )	2025-10-24 11:09:06 -07:00
kv_cache_transceiver.py	[None][feat] Support Mooncake transfer engine as a cache transceiver backend (#8309 )	2025-12-19 10:09:51 +08:00
layerwise_nvtx_marker.py	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
llm_request.py	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
make_decoding_batch_input_output.py	[None][refactor] decoding inputs, part 2 (#5799 )	2025-11-18 14:38:51 +01:00
mamba_cache_manager.py	[https://nvbugs/5537996 ][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not (#9093 )	2025-11-25 17:27:11 +08:00
model_engine.py	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
model_loader.py	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 )	2025-12-06 02:24:51 -08:00
py_executor_creator.py	[https://nvbugs/5652552 ][fix] cherry-pick add printing for llm args (#9206 )	2025-12-16 13:33:20 -05:00
py_executor.py	[TRTLLM-7736][feat] Incrementally update the inputs of target and draft models (#9708 )	2025-12-19 15:11:25 +08:00
resource_manager.py	[None][feat] Cudagraph updates for helix parallelism (#10141 )	2025-12-21 15:21:52 -05:00
sampler.py	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
sampling_utils_flashinfer.py	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 )	2025-12-09 10:44:01 +01:00
sampling_utils.py	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 )	2025-12-09 10:44:01 +01:00
scheduler.py	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 )	2025-12-08 18:43:52 -08:00
seq_slot_manager.py	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 )	2025-08-15 09:52:06 -07:00