TensorRT-LLMs/examples/wide_ep/slurm_scripts
Kaiyu Xie 47806f09d9
feat: Support custom repo_dir for SLURM script (#6546)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
..
process_gen_iterlog.py Add wide-ep benchmarking scripts (#5760) 2025-07-05 19:29:39 +08:00
README.md doc: Refactor documents and examples of disaggregated serving and wide ep (#6054) 2025-07-23 09:20:57 +08:00
submit.sh feat: Support custom repo_dir for SLURM script (#6546) 2025-08-12 22:06:59 -04:00

TensorRT-LLM Wide-EP Benchmark Scripts

This directory contains scripts for benchmarking TensorRT-LLM wide-ep performance using SLURM job scheduler.

⚠️ DISCLAIMER

These scripts are currently not QA'ed and are provided for demonstration purposes only.

Please note that:

  • These scripts have not undergone formal quality assurance testing
  • They are intended for demonstration and educational purposes
  • Use at your own risk in production environments
  • Always review and test scripts thoroughly before running in your specific environment

Scripts Overview

Core Scripts

Note that, core implementation of the slurm scripts are included in examples/disaggregated/slurm.

  1. submit.sh - Main entry point for submitting benchmark jobs
  2. process_gen_iterlog.py - Processes benchmark results and generates reports

Usage

Prerequisites

Before running the scripts, ensure you have:

  • Access to a SLURM cluster
  • Container image with TensorRT-LLM installed
  • Model files accessible on the cluster
  • Required environment variables set

Running Benchmarks

# Refer to `examples/disaggregated/slurm/`
# Please find the `disaggr_torch.slurm` script in the `examples/disaggregated/slurm/` directory.
# Make sure that SLURM parameters are correctly set in `disaggr_torch.slurm` before executing this script.
./submit.sh

Post-processes benchmark results using process_gen_iterlog.py

  • Parses iteration logs from workers
  • Calculates throughput metrics
  • Generates CSV reports
  • Supports MTP (Multi-Token Prediction) analysis