mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

feat: Add support for benchmarking individual gemms in MOE benchmark (#6080 )

Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>

2025-07-18 09:00:12 +12:00

1.2 KiB

Raw Permalink Blame History

Micro Benchmarks

This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark

Building

To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake

Benchmark Documentations

Mixture Of Experts Backend Benchmark

Caution

Disclaimer this benchmark is intended for developers to help evaluating the impact of new optimisations. This benchmark does not meet the same quality standards as other parts of TRT-LLM. Please use with caution

Target mixtureOfExpertsBackendBenchmark

This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE configurations without building a TRT engine.

Usage:

./mixtureOfExpertsBackendBenchmark

# or

./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>

For more information see:

./mixtureOfExpertsBackendBenchmark --help

The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful for sharing or comparing configurations, such as when generating a reproduction case for a performance bug

1.2 KiB Raw Permalink Blame History

Micro Benchmarks

Building

Benchmark Documentations

Mixture Of Experts Backend Benchmark

1.2 KiB

Raw Permalink Blame History