TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-19 09:15:24 +08:00

History

chenfeiz0326 7f5716ef83 Cherry-pick trtllm-gen from feat/llama4 to main (#4086 ) * feat: TRT-LLM Gen FP8 MoE Llama4 Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> * feat: TRT-LLM Gen llama4 MoE Top1 routing Signed-off-by: Jiqun Tu <jtu@nvidia.com> * feat: add per tensor FP8 TRT-LLM Gen GEMMs Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> * Update Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Update Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Add guard for routingIndicesClusterKernel Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Guard sm90+ for routingkernels Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Guard sm90+ for routingkernels Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> --------- Signed-off-by: Nikita Korobov <nkorobov@nvidia.com> Signed-off-by: Jiqun Tu <jtu@nvidia.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Co-authored-by: Nikita Korobov <nkorobov@nvidia.com> Co-authored-by: Jiqun Tu <jtu@nvidia.com>		2025-05-08 14:13:01 -07:00
..
_torch	Cherry-pick trtllm-gen from feat/llama4 to main (#4086 )	2025-05-08 14:13:01 -07:00
api_stability	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
bindings	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
disaggregated	feat: Disaggregated router class (#3584 )	2025-04-19 00:34:12 +08:00
llmapi	feat: support multi lora adapters and TP (#3885 )	2025-05-08 23:45:45 +08:00
others	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
scaffolding	[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807 )	2025-04-28 17:15:33 +08:00
tools	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
trt	Unify two versions of AllReduce custom op (#3032 )	2025-04-22 21:58:42 +08:00
utils	refactor: Move ModelSpec to core library (#3980 )	2025-05-04 01:39:09 +08:00
conftest.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00
dump_checkpoint_stats.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
profile_utils.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
pytest.ini	move the reset models into `examples/models/core` directory (#3555 )	2025-04-19 20:48:59 -07:00
test_model_runner_cpp.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
test_pip_install.py	relax the limitation of setuptools (#2992 )	2025-03-24 13:36:10 +08:00