TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-11 05:23:38 +08:00

History

Yukun He 5fa6fbd989 feat: Enhance AutoTuner inference path and code readability (#4466 ) Fix AutoTuner warmup request generating. * The current warmup phase creates one request, which is insufficient for the warmup to cover the max_num_tokens. Revise the warmup phase to a batch of requests to cover the max_num_tokens to eliminate potential fallback cases. Refactor AutoTuner API and reduce host overhead. Refine (min, opt, max) values of optimization profile setup for get_valid_tactics to achieve the correct canImplement definition. * Refine cache key assembly process to reduce host overhead and simplify API. * Fix lru_cache usage to reduce host overhead. * Move tuning config initialization as a one-time object in tunable runner to reduce host overhead. Improve tuning config readability. * Use dataclass to define tuning config. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-06-04 10:53:11 +08:00
..
_torch	feat: Enhance AutoTuner inference path and code readability (#4466 )	2025-06-04 10:53:11 +08:00
api_stability	feat: Add integration of etcd (#3738 )	2025-06-03 20:01:44 +08:00
bindings	feat: large-scale EP(part 5: Static EP load balancer with offline statistics) (#4695 )	2025-06-02 01:25:02 +08:00
disaggregated	feat: Add integration of etcd (#3738 )	2025-06-03 20:01:44 +08:00
llmapi	Fix: NVBug 5302895 (#4835 )	2025-06-04 09:31:39 +08:00
others	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
scaffolding	[TRTLLM-4638] feat(scaffolding): update Reward Controller to PRM specific controller with step split (#4337 )	2025-05-19 17:53:41 +08:00
tools	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
trt	[TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123 )	2025-05-14 15:48:07 +08:00
utils	chore: fix llm_root when LLM_ROOT is not set (#4741 )	2025-05-29 19:44:34 -07:00
conftest.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
dump_checkpoint_stats.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
profile_utils.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
pytest.ini	[TRTLLM-5053] Refactoring and Unifying the Multimodal input preparation (#4506 )	2025-06-03 12:02:07 -07:00
test_model_runner_cpp.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
test_pip_install.py	relax the limitation of setuptools (#2992 )	2025-03-24 13:36:10 +08:00