[update] readme

This commit is contained in:
jingyaogong
2026-04-21 14:34:46 +08:00
parent 5416a44471
commit 693fb1ccf1
2 changed files with 18 additions and 0 deletions
+9
View File
@@ -1203,12 +1203,21 @@ $$
**训练方式**
```bash
# ① 默认使用torch做rollout
# 方式1
torchrun --nproc_per_node N train_agent.py
# 方式2
python train_agent.py
```
```bash
# ② 使用sglang做rollout
# 需先启动sglang server
python -m sglang.launch_server --model-path ./minimind-3 --attention-backend triton --host 0.0.0.0 --port 8998
# 训练参数可参考:
python train_agent.py --rollout_engine sglang --sglang_base_url http://localhost:8998 --sglang_shared_path ./ckpt_mm --data_path ../dataset/agent_rl_math.jsonl --use_wandb
```
> 训练后的模型权重文件默认每隔`save_interval步`保存为: `agent_*.pth`
![agent_rl_loss](./images/agent_rl_loss.jpg)
+9
View File
@@ -1202,12 +1202,21 @@ Here, tool call legality, `gt` hits, format closure, unfinished penalty, and Rew
**Training method**:
```bash
# ① Default: use torch for rollout
# Method 1
torchrun --nproc_per_node N train_agent.py
# Method 2
python train_agent.py
```
```bash
# ② Use sglang for rollout
# Start sglang server first:
python -m sglang.launch_server --model-path ./minimind-3 --attention-backend triton --host 0.0.0.0 --port 8998
# Training parameters for reference:
python train_agent.py --rollout_engine sglang --sglang_base_url http://localhost:8998 --sglang_shared_path ./ckpt_mm --data_path ../dataset/agent_rl_math.jsonl --use_wandb
```
> The trained model weight files are saved by default every `save_interval steps` as: `agent_*.pth`
![agent_rl_loss](./images/agent_rl_loss.jpg)