TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

History

Yukun He 9e7182b603 [TRTLLM-9615][feat] Implement a distributed tuning system (#9621 ) Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL. * Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases. * Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability. * Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-12-15 21:08:53 +08:00
..
_torch	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 )	2025-12-15 21:08:53 +08:00
api_stability	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 )	2025-12-11 09:33:25 -08:00
bindings	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 )	2025-10-28 09:17:26 -07:00
disaggregated	[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714 )	2025-12-05 10:44:16 +08:00
executor	[https://nvbugs/5720482 ][fix] Fix test rpc streaming (#9902 )	2025-12-13 01:14:43 -08:00
llmapi	[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946 )	2025-12-14 21:46:13 -08:00
others	[None][chore] Add unittest for otlp tracing (#8716 )	2025-12-09 18:34:08 -08:00
scaffolding	[None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622 )	2025-10-30 16:02:40 +08:00
tools	[TRTC-43] [feat] Add config db and docs (#9420 )	2025-12-12 04:00:03 +08:00
trt	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 )	2025-10-22 20:53:08 -04:00
utils	[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) (#9411 )	2025-11-25 18:46:48 +01:00
conftest.py	[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065 )	2025-11-14 10:03:00 +08:00
dump_checkpoint_stats.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
gc_utils.py	[nvbug 5273941] fix: broken cyclic reference detect (#5417 )	2025-07-01 20:12:55 +08:00
profile_utils.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
pytest.ini	[https://nvbugs/5608461 ][fix] exclude InductorSubproc from thread leak check (#8704 )	2025-10-30 15:35:15 +08:00
test_model_runner_cpp.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
test_pip_install.py	[https://nvbugs/5616189 ][fix] Make more cases use local cached models (#8935 )	2025-11-11 03:14:05 -08:00