# TensorRT-LLM Performance Test Flow (Default PyTorch Flow) ## Overview This document describes the complete TensorRT-LLM performance testing workflow, particularly for the default PyTorch backend testing process for release testing. ## 1. Test Scripts ### Main Test Script The main script for TensorRT-LLM performance testing is `test_perf.py`, which is responsible for executing all performance test cases. ### Performance Metrics For trtllm-bench, the test extracts the following key performance metrics from logs: - **BUILD_TIME**: Model build time - **INFERENCE_TIME**: Inference time - **TOKEN_THROUGHPUT**: Token throughput - **SEQ_THROUGHPUT**: Sequence throughput - **FIRST_TOKEN_TIME**: First token generation time - **OUTPUT_TOKEN_TIME**: Output token time ## 2. Detailed Test Flow ### 2.1 Dataset Preparation #### Without LoRA ```python prepare_data_script = os.path.join(self._llm_root, "benchmarks", "cpp", "prepare_dataset.py") data_cmd += [ "python3", prepare_data_script, "--stdout", f"--tokenizer={tokenizer_dir}", f"token-norm-dist", f"--num-requests={self._config.num_reqs}", f"--input-mean={input_len}", f"--output-mean={output_len}", f"--input-stdev={istdev}", f"--output-stdev={ostdev}", f" > {dataset_path}" ] ``` #### With LoRA ```python "python3", prepare_data_script, f"--stdout", f"--rand-task-id 0 {nloras-1}", f"--tokenizer={tokenizer_dir}", f"--lora-dir={lora_dir}", f"token-norm-dist", f"--num-requests={self._config.num_reqs}", f"--input-mean={input_len}", f"--output-mean={output_len}", f"--input-stdev={istdev}", f"--output-stdev={ostdev}", f" > {dataset_path}" ``` ### 2.2 PyTorch Configuration Generation In `pytorch_model_config.py`, we override PyTorch configurations for certain specific cases and generate YAML configuration files. ### 2.3 Calling trtllm-bench for Throughput Testing #### Basic Command ```python benchmark_cmd = [ self._benchmark_script, f"--model={model_name}", f"--model_path={model_dir}", "throughput", f"--dataset={dataset_path}", f"--max_batch_size={self._config.max_batch_size}", f"--max_num_tokens={self._config.max_num_tokens}", f"--report_json={report_path}", ] ``` #### Backend Selection ```python if self._config.backend != "pytorch": benchmark_cmd += [ f"--backend=tensorrt", f"--engine_dir={engine_dir}" ] else: benchmark_cmd += ["--backend=pytorch"] ``` #### Optional Parameter Configuration ```python if self._config.num_reqs > 0: benchmark_cmd += [f"--num_requests={self._config.num_reqs}"] if self._config.concurrency != -1: benchmark_cmd += [f"--concurrency={self._config.concurrency}"] if self._config.ep_size != None: benchmark_cmd += [f"--ep={self._config.ep_size}"] if self._config.tp_size > 1: benchmark_cmd += [f"--tp={self._config.tp_size}"] if self._config.pp_size > 1: benchmark_cmd += [f"--pp={self._config.pp_size}"] if self._config.streaming == "streaming": benchmark_cmd += [f"--streaming"] ``` #### PyTorch Default Configuration ```python # Use default YAML configuration if self._config.backend == "pytorch": import yaml config = get_model_yaml_config(self._config.to_string(), lora_dirs=self.lora_dirs) print_info(f"pytorch model config: {config}") with open('extra-llm-api-config.yml', 'w') as f: yaml.dump(config, f, default_flow_style=False) benchmark_cmd += [ f"--extra_llm_api_options=extra-llm-api-config.yml" ] ``` ## 3. Test Scheduling ### 3.1 Full Test Cycles 1. **llm_perf_full.yml** - Release performance test - [test_lists/qa/llm_perf_full.yml](../../test_lists/qa/llm_perf_full.yml) 2. **llm_perf_cluster.yml** - Cluster performance test(for Blackwell) - [test_lists/qa/llm_perf_cluster.yml](../../test_lists/qa/llm_perf_cluster.yml) 3. **llm_perf_nim.yml** - NIM performance test - [test_lists/qa/llm_perf_nim.yml](../../test_lists/qa/llm_perf_nim.yml) ### 3.2 Sanity Test Cycles - **llm_perf_sanity.yml** - Release performance sanity test - [test_lists/qa/llm_perf_sanity.yml](../../test_lists/qa/llm_perf_sanity.yml) ## 4. Test Configuration Description ### 4.1 PyTorch Model Configuration The default PyTorch configuration is defined in [pytorch_model_config.py](pytorch_model_config.py) and can be overridden for specific test patterns. For example: ```python { 'patterns': [ 'qwen3_235b_a22b_fp4-bench-pytorch-float4-maxbs:512-maxnt:2048-input_output_len:1000,2000-con:8-ep:8-gpus:8', ], 'config': { 'enable_attention_dp': False, 'moe_config': { 'backend': 'TRTLLM' } } } ``` This configuration allows you to customize PyTorch-specific settings for different model patterns while maintaining the base configuration as a fallback. ### 4.1 Test Case Configuration - Test cases are defined in YAML configuration files - Support for different models, precisions, batch sizes, etc. - Support for LoRA and standard model testing ### 4.2 Performance Baseline - Compare regression of each release on internal TRT-Perf dashboard ### 4.3 Result Analysis - Generates detailed performance reports - Supports performance trend analysis - View performance data and compare between different runs on internal TRT-Perf dashboard ## 5. Runtime Environment Requirements ### 5.1 Dependency Installation ```bash pip install -r ./TensorRT-LLM/requirements.txt pip install -r ./TensorRT-LLM/requirements-dev.txt ``` ### 5.2 Hardware Requirements - CUDA-capable GPU - Sufficient GPU memory for model loading - Recommended to use B200/GB200 or higher performance GPU for cluster testing ## 6. Reproduce Steps To reproduce the performance tests locally, follow these steps: ### 6.1 Install Dependencies ```bash pip install -r requirements-dev.txt pip install -r requirements.txt ``` ### 6.2 Navigate to Test Directory ```bash cd tests/integration/defs ``` ### 6.3 Add Test Case to Test List ```bash echo "perf/test_perf.py::test_perf[llama_v3.3_70b_instruct_fp8-bench-pytorch-float8-input_output_len:128,128]" >> perf_test.txt ``` ### 6.4 Run Performance Test ```bash pytest -v -s --test-prefix=H100_80GB_HBM3 --test-list=perf_test.txt -R=llama_v3.3_70b_instruct_fp8-bench-pytorch-float8-input_output_len:128,128 --output-dir=./output --perf --perf-log-formats=csv -o junit_logging=out-err ``` ### 6.5 Command Parameters Explanation - `--test-prefix=H100_80GB_HBM3`: Specifies the test environment prefix - `--test-list`: Points to the test list file containing test cases - `-R`: Filter for specific test patterns - `--output-dir=./output`: Specifies the output directory for test results - `--perf`: Enables performance testing mode - `--perf-log-formats=csv`: Outputs performance logs in CSV format - `-o junit_logging=out-err`: Configures JUnit logging output ## 7. Related Documentation - [Sanity Perf Check Introduction](README.md)