TensorRT-LLMs/tests/integration/defs/perf/disagg/README.md

# TensorRT-LLM Disaggregated Benchmark Framework

A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.

## Overview

This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.

## Key Features

- **YAML Configuration**: Each test has its own independent YAML configuration file
- **Automatic Test Discovery**: Tests are automatically discovered from the config directory structure
- **Default Metrics**: Built-in default metrics configuration for common test scenarios
- **GPU Filtering**: Automatically filters tests based on hardware compatibility
- **Flexible Override**: Override default configurations as needed for special cases
- **Test Categories**: Support for both performance (perf) and accuracy tests
- **Multiple Test Types**: Support for disagg (disaggregated) and wideep architectures

## Directory Structure

```
test_configs/
├── disagg/                    # Disaggregated serving tests
│   ├── perf/                  # Performance tests
│   └── accuracy/              # Accuracy tests (optional)
└── wideep/                    # Wide-deep tests
    ├── perf/
    └── accuracy/
```

## YAML Configuration

### Minimal Configuration Example

```yaml
metadata:
  model_name: "deepseek-r1-fp4"
  precision: "fp4"
  supported_gpus: ["GB200"]

slurm:
  partition: "<partition>"
  account: "<account>"
  job_time: "02:00:00"

benchmark:
  mode: "e2e"
  streaming: true
  concurrency_list: "1 2 4 8 16 36"
  input_length: 1024
  output_length: 1024

hardware:
  gpus_per_node: 4
  num_ctx_servers: 1
  num_gen_servers: 4

environment:
  container_mount: "<container_mount>"
  container_image: "<container_image>"
  model_path: "<model_path>"

worker_config:
  gen:
    tensor_parallel_size: 8
    moe_expert_parallel_size: 8
    max_batch_size: 32
    max_num_tokens: 32
    max_seq_len: 2251
    # ... other gen worker configs

  ctx:
    tensor_parallel_size: 4
    moe_expert_parallel_size: 4
    max_batch_size: 4
    max_num_tokens: 4608
    max_seq_len: 2251
    # ... other ctx worker configs
```

### Custom Metrics (Optional)

Most tests use default metrics. To customize:

```yaml
benchmark:
  metrics:
    log_file: "custom_benchmark.log"
    extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
    metric_names: ["CUSTOM_METRIC"]
```

## GPU Support

Currently supports **OCI GB200** only. The framework is designed to support additional GPU types in the future.

All configurations must specify:
```yaml
metadata:
  supported_gpus: ["GB200"]
```

## Configuration Validation

The framework validates configurations before execution:

1. **gen_max_tokens**: Must equal `gen_max_batch_size * (mtp_size + 1)` when MTP is enabled
2. **streaming**: Must be `true`
3. **max_seq_len**: Both ctx and gen must be > (input_length + output_length)

## Running Tests

### Run all tests
```bash
poetry run pytest --disagg test_disagg.py -s -vv
```

### Run from test list
```bash
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt
```

### Run specific tests
```bash
# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf

# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy

# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"
```

## Test Naming Convention

Tests are automatically named using the format:
```
{test_type}_{category}_{config_filename}
```

Example: `disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL`

## File Naming Convention

Configuration files should follow this format:
```
{model}_{benchmark_type}_{config_details}.yaml
```

Examples:
- `deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml`
- `deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml`

Where:
- `1k1k`, `8k1k`: Input/output lengths (1024/1024, 8192/1024)
- `ctx1_gen1`: Context and generation server counts
- `dep32` or `tep8`: Data parallel (dep) or tensor parallel (tep) configuration
- `bs32`: Batch size
- `eplb0`: Expert parallel load balancing slots
- `mtp0`: Multi-token prediction layers
- `ccb-NIXL` or `ccb-UCX`: Communication backend

## Key Configuration Fields

### Metadata
- `model_name`: Model identifier
- `precision`: Model precision (fp4, fp8, etc.)
- `supported_gpus`: List of compatible GPU types

### Benchmark
- `mode`: Benchmark mode (e2e, gen_only, ctx_only)
- `streaming`: Enable streaming (must be true)
- `input_length`, `output_length`: Sequence lengths
- `concurrency_list`: Concurrency levels to test

### Worker Config
- `tensor_parallel_size`: Tensor parallelism degree
- `moe_expert_parallel_size`: MoE expert parallelism
- `max_batch_size`: Maximum batch size
- `max_num_tokens`: Maximum tokens per batch
- `max_seq_len`: Maximum sequence length
- `speculative_config`: Multi-token prediction settings (optional)

## Test Output

Test results are saved to:
- Performance metrics: `{OUTPUT_PATH}/perf_script_test_results.csv`
- Test logs: `{OUTPUT_PATH}/disagg_benchmark_{timestamp}.log`

## Environment Variables

- `GPU_TYPE`: Current GPU type (default: GB200)
- `OUTPUT_PATH`: Directory for test results and logs
- `WORK_DIR`: Working directory for benchmark execution
- `DEBUG_MODE`: Enable debug mode (set to "1" to skip job submission)
- `DEBUG_JOB_ID`: Job ID to use in debug mode

## Debug Mode

For local testing without SLURM submission:

```bash
export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv
```

## Architecture

The framework consists of:

1. **ConfigLoader**: Scans and loads YAML configurations
2. **ConfigValidator**: Validates configuration correctness
3. **JobManager**: Handles SLURM job submission and monitoring
4. **LogParser**: Extracts metrics from benchmark logs
5. **TestCaseTracker**: Tracks test execution timing
6. **ResultSaver**: Saves results to CSV

## Benefits

- **Simple**: YAML-based configuration, no code changes needed
- **Maintainable**: Each test is a separate file
- **Flexible**: Override defaults only when needed
- **Scalable**: Easy to add new tests and models
- **Reliable**: Automatic validation before execution
- **Traceable**: Comprehensive logging and result tracking

## Adding New Tests

1. Create a new YAML file in `test_configs/{test_type}/{category}/`
2. Configure the test parameters
3. Run pytest - the test will be automatically discovered

No code changes required!

---

For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.