# Expert Parallelism Load Balancer (EPLB) Effective load balancing is crucial when leveraging large-scale expert parallelism. As described in the [DeepSeek-V3 paper](https://arxiv.org/abs/2412.19437), redundant experts can be introduced to rebalance the workload across GPUs. This mechanism is known as the Expert Parallelism Load Balancer ([EPLB](https://github.com/deepseek-ai/EPLB)). > **Note:** Currently, only the offline EP load balancer is supported. ## Offline EP Load Balancer ### Step 1: Run Inference and Collect Statistics To generate the necessary statistics for load balancing, run your model on a target dataset (e.g., GSM8K) while counting the routed expert IDs during inference. Once counting is complete, the statistics will be saved for further processing. Set up some environment variables: ```bash export MODEL_PATH= # Set the expert statistic data path export EXPERT_STATISTIC_PATH=./expert_statistic # Enable counting of routed expert IDs from iteration 100 to iteration 200 export EXPERT_STATISTIC_ITER_RANGE=100-200 ``` Prepare a configuration file and run inference on GSM8K: ```bash cat > ./extra_llm_api_options.yaml < ./extra_llm_api_options_eplb.yaml <