TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-13 14:33:52 +08:00

History

Yukun He 9e7182b603 [TRTLLM-9615][feat] Implement a distributed tuning system (#9621 ) Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL. * Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases. * Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability. * Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-12-15 21:08:53 +08:00
..
attention	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 )	2025-12-15 12:42:25 +08:00
auto_deploy	[TRTLLM-9136][feat] 2D parallel EP TP support (#9459 )	2025-12-15 09:52:29 +01:00
compilation	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
debugger	Fix: fix nvbug 5356427 (#5464 )	2025-06-25 22:24:26 +08:00
executor	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
misc	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 )	2025-12-15 21:08:53 +08:00
modeling	[TRTLLM-7967][chore] Add more tests (#9415 )	2025-12-08 11:57:32 -08:00
models/checkpoints/hf	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 )	2025-12-05 16:07:20 +01:00
modules	[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858 )	2025-12-15 10:47:15 +08:00
multi_gpu	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 )	2025-12-12 02:29:30 +08:00
multi_gpu_modeling	[https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. (#8440 )	2025-11-20 12:43:13 -05:00
multimodal	[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604 )	2025-12-15 08:42:30 +08:00
ray_orchestrator	[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793 )	2025-12-13 01:02:11 -08:00
sampler	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 )	2025-12-15 13:28:54 +08:00
speculative	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 )	2025-12-09 11:06:31 -05:00
thop	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 )	2025-12-14 10:47:24 +08:00
helpers.py	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 )	2025-12-04 08:03:33 +02:00
pattern_watcher.py	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
test_connector.py	[None][feat] KV Cache Connector API (#7228 )	2025-08-28 23:09:27 -04:00