mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Neta Zmora 1d6fbbf45d [#9236 ][feature] Make sharing of activation_type across SW layers more robust (#9238 ) C++, Python and Python MoE layer all share the definition of ActivationType. Currently this is done thru redefinition which is fragile and can break when adding new activation function types. tensorrt_llm/_torch/utils.py cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h => tensorrt_llm/layers/moe.py cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>		2025-11-20 16:06:58 +08:00
..
CMakeLists.txt	fix: Fix MoE benchmark (#5966 )	2025-07-14 15:17:26 +09:00
gen-moe-benchmark-file.py	feat: Add support for benchmarking individual gemms in MOE benchmark (#6080 )	2025-07-18 09:00:12 +12:00
mixtureOfExpertsBackendBenchmarkFixture.h	[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501 )	2025-10-27 10:18:19 +08:00
mixtureOfExpertsBackendBenchmarkLauncher.cu	[#9236 ][feature] Make sharing of activation_type across SW layers more robust (#9238 )	2025-11-20 16:06:58 +08:00
README.md	feat: Add support for benchmarking individual gemms in MOE benchmark (#6080 )	2025-07-18 09:00:12 +12:00

README.md

Micro Benchmarks

This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark

Building

To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake

Benchmark Documentations

Mixture Of Experts Backend Benchmark

Caution

Disclaimer this benchmark is intended for developers to help evaluating the impact of new optimisations. This benchmark does not meet the same quality standards as other parts of TRT-LLM. Please use with caution

Target mixtureOfExpertsBackendBenchmark

This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE configurations without building a TRT engine.

Usage:

./mixtureOfExpertsBackendBenchmark

# or

./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>

For more information see:

./mixtureOfExpertsBackendBenchmark --help

The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful for sharing or comparing configurations, such as when generating a reproduction case for a performance bug