diff --git a/README.md b/README.md
index 93091a6..39b4184 100644
--- a/README.md
+++ b/README.md
@@ -1218,7 +1218,7 @@ python train_agent.py
 
 > MiniMind 在 Agentic RL 训练阶段的优化走势
 
-这里顺带提一下 `rollout_engine`。所谓“训推分离”，就是把 **参数更新** 和 **轨迹展开** 拆开：训练侧负责优化 policy，rollout 侧负责高吞吐采样，对上统一表现为“给我 prompt，我返回 rollout 结果；训练完以后，再把新权重同步回来”。因此训练脚本并不需要关心底层到底是本地 `generate` 还是远端 `inference` 引擎。
+这里顺带提一下 `rollout_engine`。所谓“训推分离”，就是把 **参数更新** 和 **轨迹展开** 拆开：训练侧负责优化 policy，rollout 侧负责高吞吐采样，对上统一表现为“给我 prompt，我返回 rollout 结果；训练完以后，再把新权重同步回来”。因此训练脚本并不需要关心底层到底是本地 `generate` 还是远端 `inference` 引擎。需要说明的是，当前实现仍是**同步**模式（采样完一批再更新），还不是纯 rollout buffer 的异步训练。
 
 ![rl-structure](./images/rl-structure.jpg)
 
diff --git a/README_en.md b/README_en.md
index 3b872b5..e8fffd8 100644
--- a/README_en.md
+++ b/README_en.md
@@ -1217,7 +1217,7 @@ python train_agent.py
 
 > MiniMind optimization trends during the Agentic RL training stage
 
-Here I'll also briefly mention the `rollout_engine`. The so-called "training-inference separation" means decoupling **parameter updates** and **trajectory rollout**: the training side handles policy optimization, while the rollout side handles high-throughput sampling. From the top level, they uniformly present as "give me a prompt, I'll return rollout results; after training is done, sync the new weights back." Therefore, the training script doesn't need to care whether the underlying implementation is local `generate` or a remote `inference` engine.
+Here I'll also briefly mention the `rollout_engine`. The so-called "training-inference separation" means decoupling **parameter updates** and **trajectory rollout**: the training side handles policy optimization, while the rollout side handles high-throughput sampling. From the top level, they uniformly present as "give me a prompt, I'll return rollout results; after training is done, sync the new weights back." Therefore, the training script doesn't need to care whether the underlying implementation is local `generate` or a remote `inference` engine. Note that the current implementation is still **synchronous** (sample a batch, then update), not yet asynchronous training with a pure rollout buffer.
 
 ![rl-structure](./images/rl-structure.jpg)