# TensorRT-LLM Benchmark Test System Benchmarking scripts for TensorRT-LLM serving performance tests with configuration-driven test cases and CSV report generation. ## Overview - Run performance benchmarks across multiple model configurations - Manage test cases through YAML configuration files - Support selective execution of specific test cases ## Scripts Overview ### 1. `benchmark_config.yaml` - Test Case Configuration **Purpose**: Defines all benchmark test cases in a structured YAML format. **Structure**: ```yaml server_configs: - name: "r1_fp4_dep4" model_name: "deepseek_r1_0528_fp4" tp: 4 ep: 4 pp: 1 attention_backend: "TRTLLM" moe_backend: "CUTLASS" moe_max_num_tokens: "" enable_attention_dp: true enable_chunked_prefill: false max_num_tokens: 2176 disable_overlap_scheduler: false kv_cache_dtype: "fp8" enable_block_reuse: false free_gpu_memory_fraction: 0.8 max_batch_size: 256 enable_padding: true client_configs: - name: "con1_iter1_1024_1024" concurrency: 1 iterations: 1 isl: 1024 osl: 1024 random_range_ratio: 0.0 - name: "con8_iter1_1024_1024" concurrency: 8 iterations: 1 isl: 1024 osl: 1024 random_range_ratio: 0.0 - name: "r1_fp4_tep4" model_name: "deepseek_r1_0528_fp4" tp: 4 ep: 4 pp: 1 attention_backend: "TRTLLM" moe_backend: "CUTLASS" moe_max_num_tokens: "" enable_attention_dp: false enable_chunked_prefill: false max_num_tokens: 2176 disable_overlap_scheduler: false kv_cache_dtype: "fp8" enable_block_reuse: false free_gpu_memory_fraction: 0.8 max_batch_size: 256 enable_padding: true client_configs: - name: "con1_iter1_1024_1024" concurrency: 1 iterations: 1 isl: 1024 osl: 1024 random_range_ratio: 0.0 - name: "con8_iter1_1024_1024" concurrency: 8 iterations: 1 isl: 1024 osl: 1024 random_range_ratio: 0.0 ``` ### 2. `run_benchmark_serve.py` - Main Benchmark Runner **Purpose**: Executes performance benchmarks based on YAML configuration files. **Usage**: ```bash python run_benchmark_serve.py --log_folder --config_file [--select ] [--timeout 5400] ``` **Arguments**: - `--log_folder`: Directory to store benchmark logs (required) - `--config_file`: Path to YAML configuration file (required) - `--select`: Select pattern for specific Server and Client Config. (optional, default: all test cases) - `--timeout`: Timeout for server setup. (optional, default: 3600 seconds) **Examples**: ```bash # Select python run_benchmark_serve.py --log_folder ./results --config_file benchmark_config.yaml --select "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024" ``` ### 3. `parse_benchmark_results.py` - Results Parser **Purpose**: Print log's perf. **Arguments**: - `--log_folder`: Directory to store benchmark logs (required) **Usage**: ```bash python parse_benchmark_results.py --log_folder ``` ### 4. `benchmark-serve.sh` - SLURM Job Script **Usage**: ```bash sbatch benchmark-serve.sh [IMAGE] [bench_dir] [log_folder] [select_pattern] ``` **Parameters**: - `IMAGE`: Docker image (default: tensorrt-llm-staging/release:main-x86_64) - `bench_dir`: Directory containing config file and benchmark scripts (default: current directory) - `log_folder`: Directory containing output logs and csv. (default: current directory) - `select_pattern`: Select pattern (default: default - all test cases) **Examples**: ```bash bench_dir="/path/to/benchmark/scripts" log_folder="/path/to/store/output/files" sbatch --reservation=RES--COM-3970 --qos=reservation -D ${log_folder} ${bench_dir}/benchmark-serve.sh urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release:main-x86_64 ${bench_dir} ${log_folder} "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024" ```