TensorRT-LLMs/cpp/micro_benchmarks/README.md
Kaiyu Xie 9dbc5b38ba
Update TensorRT-LLM (#1891)
* Update TensorRT-LLM

---------

Co-authored-by: Marks101 <markus.schnoes@gmx.de>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-07-04 14:37:19 +08:00

37 lines
1001 B
Markdown

# Micro Benchmarks
This folder contains benchmarks for specific components in TRT-LLM,
using [google-benchmark](https://github.com/google/benchmark/tree/main)
## Building
To build add the `--micro_benchmark` flag to `build_wheel.py` or pass `-DBUILD_MICRO_BENCHMARKS=ON` to cmake
## Benchmark Documentations
### Mixture Of Experts Backend Benchmark
Target `mixtureOfExpertsBackendBenchmark`
This benchmark covers the backend used by the `MixtureOfExperts` plugin. It allows you to benchmark different MOE
configurations without building a TRT engine.
Usage:
```bash
./mixtureOfExpertsBackendBenchmark
# or
./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>
```
For more information see:
```
./mixtureOfExpertsBackendBenchmark --help
```
The `gen-moe-workload-file.py` is a helper script that can generate workload files for MOE benchmarks. This is useful
for sharing or comparing configurations, such as when generating a reproduction case for a performance bug