TensorRT-LLMs/tensorrt_llm
Yukun He 9e7182b603
[TRTLLM-9615][feat] Implement a distributed tuning system (#9621)
Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL.

* Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases.
* Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability.
* Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-15 21:08:53 +08:00
..
_tensorrt_engine [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
_torch [TRTLLM-9615][feat] Implement a distributed tuning system (#9621) 2025-12-15 21:08:53 +08:00
bench [TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250) 2025-12-08 10:37:40 -08:00
commands [https://nvbugs/5727517][fix] Preserve ip:port for disagg (#9859) 2025-12-12 09:45:34 +08:00
evaluate [https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881) 2025-12-13 08:35:31 +08:00
executor [None][doc] update readme for rpc (#9972) 2025-12-15 10:16:50 +08:00
inputs [None][feat] Support Mistral Large3 LLM part (#9820) 2025-12-13 11:44:27 +08:00
layers [#9236][feature] Make sharing of activation_type across SW layers more robust (#9238) 2025-11-20 16:06:58 +08:00
llmapi [TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524) 2025-12-15 12:42:25 +08:00
metrics [None][feat] Add trtllm_ prefix for exposed metrics (#8845) 2025-11-06 15:27:18 +08:00
models [#2730][fix] Fix circular import bug in medusa/weight.py (#9866) 2025-12-11 13:51:08 +08:00
plugin [None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (#9538) 2025-11-28 16:45:23 +08:00
quantization [OMNIML-2932] [feat] nvfp4 awq support (#8698) 2025-12-03 19:47:13 +02:00
runtime [#6425][fix] address CUDA stream sync issue in ModelRunnerCPP (#6426) 2025-12-12 13:33:22 +08:00
scaffolding [None][feat] Deep Research Implemented with Scaffolding (#8452) 2025-11-06 10:33:28 +08:00
serve [TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720) 2025-12-12 22:29:41 -08:00
tools [None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667) 2025-12-04 13:41:15 +08:00
__init__.py [TRTLLM-9736][feat] AsyncLLM and verl integ (#9353) 2025-12-11 09:33:25 -08:00
_common.py [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851) 2025-09-25 21:02:35 +08:00
_dlpack_utils.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
_ipc_utils.py [None][chore] Modify python ipc_util to align with C++ path (#9894) 2025-12-12 15:55:22 +08:00
_mnnvl_utils.py [https://nvbugs/5477730][fix] Fix the alltoall case when tp_size larger than ep_size (#7331) 2025-09-04 08:10:03 -04:00
_ray_utils.py [TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302) 2025-11-04 10:19:24 -08:00
_utils.py [TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604) 2025-12-15 08:42:30 +08:00
builder.py [TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330) 2025-10-28 09:17:26 -07:00
disaggregated_params.py [TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577) 2025-09-22 19:07:18 -07:00
functional.py [#8921][feat] Added symetric memory AllReduce strategy (#8919) 2025-12-08 13:12:56 -08:00
graph_rewriting.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
logger.py [None][chore] Mass integration of release/1.0 - 3rd (#7519) 2025-09-08 14:03:04 +08:00
lora_helper.py [TRTLLM-8682][chore] Remove auto_parallel module (#8329) 2025-10-22 20:53:08 -04:00
lora_manager.py [https://nvbugs/5510879][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063) 2025-10-12 12:29:52 -07:00
mapping.py [None][feat] Async pp send for PPCommTorch. (#9976) 2025-12-15 14:03:46 +08:00
math_utils.py perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318) 2025-06-26 14:03:56 +08:00
module.py [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851) 2025-09-25 21:02:35 +08:00
network.py [TRTLLM-8682][chore] Remove auto_parallel module (#8329) 2025-10-22 20:53:08 -04:00
parameter.py fix:https://nvbugs/5234033 enable starcoder trt-flow with transforme… (#3909) 2025-05-15 11:16:45 +08:00
profiler.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
prompt_adapter_manager.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
python_plugin.py linting(python): Enable ruff on more files (wave 1/N) (#5140) 2025-06-14 19:19:34 +08:00
ray_stub.py [TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175) 2025-10-14 23:46:30 +08:00
sampling_params.py [None] [feat] add eos_token_id in generation_config to sampling params (#9514) 2025-12-12 00:52:03 +08:00
scheduling_params.py [None][feat] Add support of scheduling attention dp request (#6246) 2025-08-01 20:38:01 -04:00
serialization.py [TRTLLM-8682][chore] Remove auto_parallel module (#8329) 2025-10-22 20:53:08 -04:00
top_model_mixin.py [TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277) 2025-10-17 16:13:22 -04:00
version.py [None][chore] bump version to 1.2.0rc6 (#9874) 2025-12-10 04:53:26 -08:00