TensorRT-LLMs/cpp/micro_benchmarks/README.md
Daniel Stokes ae28b3a664
feat: Add support for benchmarking individual gemms in MOE benchmark (#6080)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-18 09:00:12 +12:00

40 lines
1.2 KiB
Markdown

# Micro Benchmarks
This folder contains benchmarks for specific components in TRT-LLM,
using [google-benchmark](https://github.com/google/benchmark/tree/main)
## Building
To build add the `--micro_benchmark` flag to `build_wheel.py` or pass `-DBUILD_MICRO_BENCHMARKS=ON` to cmake
## Benchmark Documentations
### Mixture Of Experts Backend Benchmark
> [!CAUTION]
> Disclaimer this benchmark is intended for developers to help evaluating the impact of new optimisations. This benchmark does not meet the same quality standards as other parts of TRT-LLM. Please use with caution
Target `mixtureOfExpertsBackendBenchmark`
This benchmark covers the backend used by the `MixtureOfExperts` plugin. It allows you to benchmark different MOE
configurations without building a TRT engine.
Usage:
```bash
./mixtureOfExpertsBackendBenchmark
# or
./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>
```
For more information see:
```
./mixtureOfExpertsBackendBenchmark --help
```
The `gen-moe-workload-file.py` is a helper script that can generate workload files for MOE benchmarks. This is useful
for sharing or comparing configurations, such as when generating a reproduction case for a performance bug