TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Jinyang Yuan 5339d367ce [perf] Reduce the workspace size of FP4 activation scales for MoE (#4303 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>		2025-05-30 09:03:52 +08:00
..
CMakeLists.txt	Update TensorRT-LLM (#524 )	2023-12-01 22:27:51 +08:00
mixtureOfExpertsPlugin.cpp	[perf] Reduce the workspace size of FP4 activation scales for MoE (#4303 )	2025-05-30 09:03:52 +08:00
mixtureOfExpertsPlugin.h	feat: support add internal cutlass kernels as subproject (#3658 )	2025-05-06 11:35:07 +08:00