mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
* Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| build_cache.py | ||
| disagg_utils.py | ||
| kv_cache_type.py | ||
| llm_args.py | ||
| llm_utils.py | ||
| llm.py | ||
| mgmn_leader_node.py | ||
| mgmn_worker_node.py | ||
| mm_encoder.py | ||
| mpi_session.py | ||
| reasoning_parser.py | ||
| rlhf_utils.py | ||
| tokenizer.py | ||
| tracer.py | ||
| tracing.py | ||
| trtllm-llmapi-launch | ||
| utils.py | ||