TensorRT-LLMs/tensorrt_llm/llmapi
William Zhang ca9537e17c
[TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264)
* Why?

Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.

* What?

This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.

The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-11 20:50:00 -08:00
..
__init__.py [TRTLLM-8921][feat] implement gen-first disagg_service (#11020) 2026-02-03 15:46:11 -05:00
build_cache.py [TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330) 2025-10-28 09:17:26 -07:00
disagg_utils.py [None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349) 2026-02-10 15:43:28 +08:00
kv_cache_type.py [TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330) 2025-10-28 09:17:26 -07:00
llm_args.py [None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations (#11330) 2026-02-12 09:18:24 +08:00
llm_utils.py [TRTLLM-9771][feat] Allow overriding quantization configs (#11062) 2026-01-31 10:48:51 -05:00
llm.py [TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264) 2026-02-11 20:50:00 -08:00
mgmn_leader_node.py [https://nvbugs/5783876][fix] fix hmac launch (#10434) 2026-01-22 23:20:53 +08:00
mgmn_worker_node.py Update TensorRT-LLM (#2333) 2024-10-15 15:28:40 +08:00
mm_encoder.py [TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758) 2025-12-22 06:32:49 -05:00
mpi_session.py [None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349) 2026-02-10 15:43:28 +08:00
reasoning_parser.py [None][feat] Update reasoning parser for nano-v3 (#9944) 2025-12-15 05:39:37 -08:00
rlhf_utils.py [TRTLLM-9771][feat] Support partial update weight for fp8 (#10456) 2026-01-22 14:46:05 +08:00
tokenizer.py [TRTLLM-9654][feat] Support DeepSeek-V32 chat template (#9814) 2025-12-19 17:05:38 +08:00
tracer.py Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
tracing.py [None][feat] Add opentelemetry tracing (#5897) 2025-10-27 18:51:07 +08:00
trtllm-llmapi-launch [https://nvbugs/5569754][fix] trtllm-llmapi-launch port conflict (#8582) 2025-11-20 12:43:13 -05:00
utils.py [https://nvbugs/5680911][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730) 2026-02-02 16:26:46 +08:00