[feat] update docs

2026-05-01 11:48:14 +08:00 · 2025-10-26 18:59:16 +08:00 · 2025-10-26 18:59:16 +08:00 · 6bedefcaca
commit 6bedefcaca
parent f44ee7a1b0
2 changed files with 43 additions and 21 deletions
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -64,16 +64,16 @@ git clone https://www.modelscope.cn/models/gongjy/MiniMind2.git
 ### 3. Command-Line Chat

 ```bash
-# load=0: load PyTorch model, load=1: load transformers model
-python eval_model.py --load 1 --model_mode 2
+# Use transformers format model
+python eval_llm.py --load_from ./MiniMind2
 ```

-**Model Modes**:
- `model_mode 0`: Pretrain model (word continuation)
- `model_mode 1`: SFT Chat model (conversation)
- `model_mode 2`: RLHF model (refined responses, currently same as SFT for small models)
- `model_mode 3`: Reasoning model (with thinking chains)
- `model_mode 4/5`: RLAIF models (PPO/GRPO trained)
+**Weight Options** (`--weight` parameter):
+- `pretrain`: Pretrain model (word continuation)
+- `full_sft`: SFT Chat model (conversation)
+- `dpo`: DPO model (preference optimization)
+- `reason`: Reasoning model (with thinking chains)
+- `ppo_actor`, `grpo`, `spo`: RLAIF models (reinforcement learning trained)

 **Example Session**:
 ```text
@ -103,7 +103,7 @@ Visit `http://localhost:8501` to use the interactive web interface.
 Extend context length beyond training with RoPE extrapolation:

 ```bash
-python eval_model.py --inference_rope_scaling True
+python eval_llm.py --weight full_sft --inference_rope_scaling
 ```

 This enables the YaRN algorithm to handle sequences longer than the 2K training context, useful for processing documents and long conversations.
@ -227,10 +227,10 @@ A: 珠穆朗玛峰（Mount Everest）是世界上最高的山峰，位于喜马
 **Solution**:
 ```bash
 # Reduce batch size
-python eval_model.py --batch_size 1
+python eval_llm.py --batch_size 1

 # Or use CPU (slow but works)
-python eval_model.py --device cpu
+python eval_llm.py --device cpu
 ```

 ### Issue: Slow Inference
@ -244,7 +244,7 @@ python eval_model.py --device cpu
 ### Issue: Model Responses Are Poor Quality

 **Possible Causes**:
- Using pretrain model (`model_mode 0`) instead of SFT (`model_mode 1`)
+- Using pretrain model (`--weight pretrain`) instead of SFT (`--weight full_sft`)
 - Model is undertrained - download the full checkpoint instead
 - Input prompt is too short - provide more context

--- a/docs/training.md
+++ b/docs/training.md
@ -54,7 +54,7 @@ cd dataset
 ├── sft_512.jsonl (7.5GB, standard SFT)
 ├── sft_1024.jsonl (5.6GB, longer SFT)
 ├── sft_2048.jsonl (9GB, very long SFT)
-├── dpo.jsonl (909MB, DPO training)
+├── dpo.jsonl ✨ (55MB, DPO training - optimized and simplified)
 ├── r1_mix_1024.jsonl (340MB, reasoning distillation)
 ├── rlaif-mini.jsonl (1MB, RLAIF algorithms)
 ├── lora_identity.jsonl (22.8KB, identity LoRA)
@ -112,6 +112,25 @@ All training scripts are in the `./trainer` directory.
 cd trainer
 ```

+!!! info "💡 Checkpoint Resume Training"
+    All training scripts automatically save checkpoints. Simply add `--from_resume 1` parameter to automatically detect, load & resume training:
+    
+    ```bash
+    python train_pretrain.py --from_resume 1
+    python train_full_sft.py --from_resume 1
+    python train_dpo.py --from_resume 1
+    # ... and all other training scripts
+    ```
+    
+    **Checkpoint Resume Mechanism:**
+    
+    - Training process automatically saves complete checkpoints in `./checkpoints/` directory (model, optimizer, training progress, etc.)
+    - Checkpoint file naming: `<weight_name>_<dimension>_resume.pth` (e.g., `full_sft_512_resume.pth`)
+    - Supports cross-GPU recovery (automatically adjusts step)
+    - Supports wandb training log continuity (automatically resumes the same run)
+    
+    > Suitable for long training sessions or unstable environments, no need to worry about progress loss from interruptions
+
 ### Stage 1: Pretraining

 **Purpose**: Learn foundational knowledge (word continuation)
@ -234,7 +253,7 @@ python train_dpo.py
 torchrun --nproc_per_node 2 train_dpo.py
 ```

-**Output**: `./out/rlhf_*.pth`
+**Output**: `./out/dpo_*.pth`

 **Key Features**:
 - Off-policy training (reuse data across epochs)
@ -439,35 +458,38 @@ python train_xxx.py --use_wandb  # Automatically uses SwanLab if available
 ### Evaluate Pretrain Model

 ```bash
-python eval_model.py --model_mode 0
+python eval_llm.py --weight pretrain
 ```

 ### Evaluate Chat Model

 ```bash
-python eval_model.py --model_mode 1
+python eval_llm.py --weight full_sft
 ```

 ### Evaluate with LoRA

 ```bash
-python eval_model.py --lora_name 'lora_medical' --model_mode 1
+python eval_llm.py --weight dpo --lora_weight lora_medical
 ```

 ### Evaluate Reasoning Model

 ```bash
-python eval_model.py --model_mode 3
+python eval_llm.py --weight reason
 ```

 ### Evaluate RLAIF Models

 ```bash
 # PPO model
-python eval_model.py --model_mode 4
+python eval_llm.py --weight ppo_actor

 # GRPO model
-python eval_model.py --model_mode 4
+python eval_llm.py --weight grpo
+
+# SPO model
+python eval_llm.py --weight spo
 ```

 ### RoPE Length Extrapolation
@ -475,7 +497,7 @@ python eval_model.py --model_mode 4
 Test with extended context:

 ```bash
-python eval_model.py --model_mode 1 --inference_rope_scaling True
+python eval_llm.py --weight full_sft --inference_rope_scaling
 ```

 ## 📐 Model Architecture