mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* Allow FP8 on SM120 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix sm121 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix pre-commit Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * review update Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> --------- Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| gen-moe-benchmark-file.py | ||
| mixtureOfExpertsBackendBenchmarkFixture.h | ||
| mixtureOfExpertsBackendBenchmarkLauncher.cu | ||
| README.md | ||
Micro Benchmarks
This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark
Building
To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake
Benchmark Documentations
Mixture Of Experts Backend Benchmark
Target mixtureOfExpertsBackendBenchmark
This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE
configurations without building a TRT engine.
Usage:
./mixtureOfExpertsBackendBenchmark
# or
./mixtureOfExpertsBackendBenchmark --input_file <JSON benchmark definition>
For more information see:
./mixtureOfExpertsBackendBenchmark --help
The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful
for sharing or comparing configurations, such as when generating a reproduction case for a performance bug