TensorRT-LLMs/cpp/micro_benchmarks
Pamela Peng 6cdfc54883
feat: Add FP8 support for SM 120 (#3248)
* Allow FP8 on SM120

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>

* fix sm121

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>

* fix

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>

* fix pre-commit

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>

* review update

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>

---------

Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-04-14 16:05:41 -07:00
..
CMakeLists.txt Update TensorRT-LLM (#2008) 2024-07-23 23:05:09 +08:00
gen-moe-benchmark-file.py Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
mixtureOfExpertsBackendBenchmarkFixture.h feat: Add FP8 support for SM 120 (#3248) 2025-04-14 16:05:41 -07:00
mixtureOfExpertsBackendBenchmarkLauncher.cu Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
README.md Update TensorRT-LLM (#1891) 2024-07-04 14:37:19 +08:00

Micro Benchmarks

This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark

Building

To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake

Benchmark Documentations

Mixture Of Experts Backend Benchmark

Target mixtureOfExpertsBackendBenchmark

This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE configurations without building a TRT engine.

Usage:

./mixtureOfExpertsBackendBenchmark

# or

./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>

For more information see:

./mixtureOfExpertsBackendBenchmark --help

The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful for sharing or comparing configurations, such as when generating a reproduction case for a performance bug