TensorRT-LLMs/cpp/micro_benchmarks
cheshirekow 1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
..
CMakeLists.txt [TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986) 2025-11-20 16:44:23 -08:00
gen-moe-benchmark-file.py feat: Add support for benchmarking individual gemms in MOE benchmark (#6080) 2025-07-18 09:00:12 +12:00
mixtureOfExpertsBackendBenchmarkFixture.h [None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501) 2025-10-27 10:18:19 +08:00
mixtureOfExpertsBackendBenchmarkLauncher.cu [#9236][feature] Make sharing of activation_type across SW layers more robust (#9238) 2025-11-20 16:06:58 +08:00
README.md feat: Add support for benchmarking individual gemms in MOE benchmark (#6080) 2025-07-18 09:00:12 +12:00

Micro Benchmarks

This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark

Building

To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake

Benchmark Documentations

Mixture Of Experts Backend Benchmark

Caution

Disclaimer this benchmark is intended for developers to help evaluating the impact of new optimisations. This benchmark does not meet the same quality standards as other parts of TRT-LLM. Please use with caution

Target mixtureOfExpertsBackendBenchmark

This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE configurations without building a TRT engine.

Usage:

./mixtureOfExpertsBackendBenchmark

# or

./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>

For more information see:

./mixtureOfExpertsBackendBenchmark --help

The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful for sharing or comparing configurations, such as when generating a reproduction case for a performance bug