From 693fb1ccf15b068c1b6d6243b2639cafb21246f9 Mon Sep 17 00:00:00 2001 From: jingyaogong Date: Tue, 21 Apr 2026 14:34:46 +0800 Subject: [PATCH] [update] readme --- README.md | 9 +++++++++ README_en.md | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/README.md b/README.md index 2722d9f..c349534 100644 --- a/README.md +++ b/README.md @@ -1203,12 +1203,21 @@ $$ **训练方式**: ```bash +# ① 默认使用torch做rollout # 方式1 torchrun --nproc_per_node N train_agent.py # 方式2 python train_agent.py ``` +```bash +# ② 使用sglang做rollout +# 需先启动sglang server: +python -m sglang.launch_server --model-path ./minimind-3 --attention-backend triton --host 0.0.0.0 --port 8998 +# 训练参数可参考: +python train_agent.py --rollout_engine sglang --sglang_base_url http://localhost:8998 --sglang_shared_path ./ckpt_mm --data_path ../dataset/agent_rl_math.jsonl --use_wandb +``` + > 训练后的模型权重文件默认每隔`save_interval步`保存为: `agent_*.pth` ![agent_rl_loss](./images/agent_rl_loss.jpg) diff --git a/README_en.md b/README_en.md index 038c2d2..9cae6a6 100644 --- a/README_en.md +++ b/README_en.md @@ -1202,12 +1202,21 @@ Here, tool call legality, `gt` hits, format closure, unfinished penalty, and Rew **Training method**: ```bash +# ① Default: use torch for rollout # Method 1 torchrun --nproc_per_node N train_agent.py # Method 2 python train_agent.py ``` +```bash +# ② Use sglang for rollout +# Start sglang server first: +python -m sglang.launch_server --model-path ./minimind-3 --attention-backend triton --host 0.0.0.0 --port 8998 +# Training parameters for reference: +python train_agent.py --rollout_engine sglang --sglang_base_url http://localhost:8998 --sglang_shared_path ./ckpt_mm --data_path ../dataset/agent_rl_math.jsonl --use_wandb +``` + > The trained model weight files are saved by default every `save_interval steps` as: `agent_*.pth` ![agent_rl_loss](./images/agent_rl_loss.jpg)