TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

History

William Zhang ca9537e17c [TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264 ) * Why? Prior to this commit, we only supported a single multimodal input for E/P/D disaggregated serving. * What? This commit does a minor refactor of the multimodal embedding handles that cross process boundaries to enable this. Existing unit tests are updated accordingly to test this. The `RequestOutput` has its `mm_embedding_handle` replaced in favor of `disaggregated_params`, addressing a previous TODO. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2026-02-11 20:50:00 -08:00
..
__init__.py	[TRTLLM-8921][feat] implement gen-first disagg_service (#11020 )	2026-02-03 15:46:11 -05:00
build_cache.py	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
disagg_utils.py	[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349 )	2026-02-10 15:43:28 +08:00
kv_cache_type.py	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
llm_args.py	[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations (#11330 )	2026-02-12 09:18:24 +08:00
llm_utils.py	[TRTLLM-9771][feat] Allow overriding quantization configs (#11062 )	2026-01-31 10:48:51 -05:00
llm.py	[TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264 )	2026-02-11 20:50:00 -08:00
mgmn_leader_node.py	[https://nvbugs/5783876 ][fix] fix hmac launch (#10434 )	2026-01-22 23:20:53 +08:00
mgmn_worker_node.py	Update TensorRT-LLM (#2333 )	2024-10-15 15:28:40 +08:00
mm_encoder.py	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 )	2025-12-22 06:32:49 -05:00
mpi_session.py	[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349 )	2026-02-10 15:43:28 +08:00
reasoning_parser.py	[None][feat] Update reasoning parser for nano-v3 (#9944 )	2025-12-15 05:39:37 -08:00
rlhf_utils.py	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 )	2026-01-22 14:46:05 +08:00
tokenizer.py	[TRTLLM-9654][feat] Support DeepSeek-V32 chat template (#9814 )	2025-12-19 17:05:38 +08:00
tracer.py	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
tracing.py	[None][feat] Add opentelemetry tracing (#5897 )	2025-10-27 18:51:07 +08:00
trtllm-llmapi-launch	[https://nvbugs/5569754 ][fix] trtllm-llmapi-launch port conflict (#8582 )	2025-11-20 12:43:13 -05:00
utils.py	[https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730 )	2026-02-02 16:26:46 +08:00