TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 03:35:00 +08:00

History

Yukun He 9e7182b603 [TRTLLM-9615][feat] Implement a distributed tuning system (#9621 ) Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL. * Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases. * Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability. * Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-12-15 21:08:53 +08:00
..
__init__.py	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
_util.py	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 )	2025-12-09 18:51:31 -05:00
config_utils.py	[None][feat] Support Mistral Large3 LLM part (#9820 )	2025-12-13 11:44:27 +08:00
cuda_graph_runner.py	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 )	2025-12-15 12:42:25 +08:00
executor_request_queue.py	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
finish_reason.py	[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328 )	2025-06-25 17:41:36 +02:00
grammar_matcher.py	[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend (#8717 )	2025-10-29 20:37:14 +08:00
guided_decoder.py	[None][feat] Graceful Error Handling for Guided Decoder (#9078 )	2025-12-13 19:57:59 +08:00
handle_additional_outputs.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
handle_logits.py	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 )	2025-11-17 18:07:13 +01:00
kv_cache_connector.py	[None][feat] Support KV Connector with Disagg Prefill Worker (#8246 )	2025-10-24 11:09:06 -07:00
kv_cache_transceiver.py	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 )	2025-11-18 20:59:17 -05:00
layerwise_nvtx_marker.py	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
llm_request.py	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
make_decoding_batch_input_output.py	[None][refactor] decoding inputs, part 2 (#5799 )	2025-11-18 14:38:51 +01:00
mamba_cache_manager.py	[https://nvbugs/5537996 ][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not (#9093 )	2025-11-25 17:27:11 +08:00
model_engine.py	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 )	2025-12-15 21:08:53 +08:00
model_loader.py	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 )	2025-12-06 02:24:51 -08:00
py_executor_creator.py	[None][feat] Implement sampling on 1-model EAGLE3 (#9885 )	2025-12-13 07:38:22 -08:00
py_executor.py	[None][feat] Graceful Error Handling for Guided Decoder (#9078 )	2025-12-13 19:57:59 +08:00
resource_manager.py	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
sampler.py	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 )	2025-12-15 13:28:54 +08:00
sampling_utils_flashinfer.py	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 )	2025-12-09 10:44:01 +01:00
sampling_utils.py	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 )	2025-12-09 10:44:01 +01:00
scheduler.py	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 )	2025-12-08 18:43:52 -08:00
seq_slot_manager.py	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 )	2025-08-15 09:52:06 -07:00