mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-16 15:55:08 +08:00
* Why? Prior to this commit, we only supported a single multimodal input for E/P/D disaggregated serving. * What? This commit does a minor refactor of the multimodal embedding handles that cross process boundaries to enable this. Existing unit tests are updated accordingly to test this. The `RequestOutput` has its `mm_embedding_handle` replaced in favor of `disaggregated_params`, addressing a previous TODO. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| build_cache.py | ||
| disagg_utils.py | ||
| kv_cache_type.py | ||
| llm_args.py | ||
| llm_utils.py | ||
| llm.py | ||
| mgmn_leader_node.py | ||
| mgmn_worker_node.py | ||
| mm_encoder.py | ||
| mpi_session.py | ||
| reasoning_parser.py | ||
| rlhf_utils.py | ||
| tokenizer.py | ||
| tracer.py | ||
| tracing.py | ||
| trtllm-llmapi-launch | ||
| utils.py | ||