TensorRT-LLMs/tests/scripts/perf-sanity
Liao Lanyu bf84d9cea1
[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (#9533)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-28 14:52:05 +08:00
..
benchmark-serve.sh [TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985) 2025-10-22 10:17:22 +08:00
l0_dgx_b200.yaml [TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653) 2025-11-03 16:23:13 +08:00
l0_dgx_b300.yaml [TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653) 2025-11-03 16:23:13 +08:00
parse_benchmark_results.py [TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985) 2025-10-22 10:17:22 +08:00
README.md [TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985) 2025-10-22 10:17:22 +08:00
run_benchmark_serve.py [None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (#9533) 2025-11-28 14:52:05 +08:00

TensorRT-LLM Benchmark Test System

Benchmarking scripts for TensorRT-LLM serving performance tests with configuration-driven test cases and CSV report generation.

Overview

  • Run performance benchmarks across multiple model configurations
  • Manage test cases through YAML configuration files
  • Support selective execution of specific test cases

Scripts Overview

1. benchmark_config.yaml - Test Case Configuration

Purpose: Defines all benchmark test cases in a structured YAML format.

Structure:

server_configs:
  - name: "r1_fp4_dep4"
    model_name: "deepseek_r1_0528_fp4"
    tp: 4
    ep: 4
    pp: 1
    attention_backend: "TRTLLM"
    moe_backend: "CUTLASS"
    moe_max_num_tokens: ""
    enable_attention_dp: true
    enable_chunked_prefill: false
    max_num_tokens: 2176
    disable_overlap_scheduler: false
    kv_cache_dtype: "fp8"
    enable_block_reuse: false
    free_gpu_memory_fraction: 0.8
    max_batch_size: 256
    enable_padding: true
    client_configs:
      - name: "con1_iter1_1024_1024"
        concurrency: 1
        iterations: 1
        isl: 1024
        osl: 1024
        random_range_ratio: 0.0
      - name: "con8_iter1_1024_1024"
        concurrency: 8
        iterations: 1
        isl: 1024
        osl: 1024
        random_range_ratio: 0.0

  - name: "r1_fp4_tep4"
    model_name: "deepseek_r1_0528_fp4"
    tp: 4
    ep: 4
    pp: 1
    attention_backend: "TRTLLM"
    moe_backend: "CUTLASS"
    moe_max_num_tokens: ""
    enable_attention_dp: false
    enable_chunked_prefill: false
    max_num_tokens: 2176
    disable_overlap_scheduler: false
    kv_cache_dtype: "fp8"
    enable_block_reuse: false
    free_gpu_memory_fraction: 0.8
    max_batch_size: 256
    enable_padding: true
    client_configs:
      - name: "con1_iter1_1024_1024"
        concurrency: 1
        iterations: 1
        isl: 1024
        osl: 1024
        random_range_ratio: 0.0
      - name: "con8_iter1_1024_1024"
        concurrency: 8
        iterations: 1
        isl: 1024
        osl: 1024
        random_range_ratio: 0.0

2. run_benchmark_serve.py - Main Benchmark Runner

Purpose: Executes performance benchmarks based on YAML configuration files.

Usage:

python run_benchmark_serve.py --log_folder <log_folder> --config_file <config_file> [--select <select_pattern>] [--timeout 5400]

Arguments:

  • --log_folder: Directory to store benchmark logs (required)
  • --config_file: Path to YAML configuration file (required)
  • --select: Select pattern for specific Server and Client Config. (optional, default: all test cases)
  • --timeout: Timeout for server setup. (optional, default: 3600 seconds)

Examples:

# Select
python run_benchmark_serve.py --log_folder ./results --config_file benchmark_config.yaml --select "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024"

3. parse_benchmark_results.py - Results Parser

Purpose: Print log's perf.

Arguments:

  • --log_folder: Directory to store benchmark logs (required)

Usage:

python parse_benchmark_results.py --log_folder <log_folder>

4. benchmark-serve.sh - SLURM Job Script

Usage:

sbatch benchmark-serve.sh [IMAGE] [bench_dir] [log_folder] [select_pattern]

Parameters:

  • IMAGE: Docker image (default: tensorrt-llm-staging/release:main-x86_64)
  • bench_dir: Directory containing config file and benchmark scripts (default: current directory)
  • log_folder: Directory containing output logs and csv. (default: current directory)
  • select_pattern: Select pattern (default: default - all test cases)

Examples:


bench_dir="/path/to/benchmark/scripts"
log_folder="/path/to/store/output/files"
sbatch --reservation=RES--COM-3970 --qos=reservation -D ${log_folder} ${bench_dir}/benchmark-serve.sh urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release:main-x86_64 ${bench_dir} ${log_folder} "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024"