mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Kaiyu Xie f430a4b447 Update TensorRT-LLM (#1688 ) * Update TensorRT-LLM --------- Co-authored-by: IbrahimAmin <ibrahimamin532@gmail.com> Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com> Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com> Co-authored-by: CoderHam <hemant@cohere.com> Co-authored-by: Konstantin Lopuhin <kostia.lopuhin@gmail.com>		2024-05-28 20:07:49 +08:00
..
CMakeLists.txt	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00
gen-moe-benchmark-file.py	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00
mixtureOfExpertsBackendBenchmarkFixture.h	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00
mixtureOfExpertsBackendBenchmarkLauncher.cu	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00
README.md	Update TensorRT-LLM (#1688 )	2024-05-28 20:07:49 +08:00

README.md

Micro Benchmarks

This folder contains benchmarks for specific components in TRT-LLM, using google-benchmark

Building

To build add the --micro_benchmark flag to build_wheel.py or pass -DBUILD_MICRO_BENCHMARKS=ON to cmake

Benchmark Documentations

Mixture Of Experts Backend Benchmark

Target mixtureOfExpertsBackendBenchmark

This benchmark covers the backend used by the MixtureOfExperts plugin. It allows you to benchmark different MOE configurations without building a TRT engine.

Usage:

./mixtureOfExpertsBackendBenchmark

# or

./mixtureOfExpertsBackendBenchmark --benchmark_file <JSON benchmark definition>

For more information see:

./mixtureOfExpertsBackendBenchmark --help

The gen-moe-workload-file.py is a helper script that can generate workload files for MOE benchmarks. This is useful for sharing or comparing configurations, such as when generating a reproduction case for a performance bug