mirror of
https://github.com/jingyaogong/minimind.git
synced 2026-05-01 11:48:14 +08:00
[feat] update docs
This commit is contained in:
parent
f44ee7a1b0
commit
6bedefcaca
@ -64,16 +64,16 @@ git clone https://www.modelscope.cn/models/gongjy/MiniMind2.git
|
||||
### 3. Command-Line Chat
|
||||
|
||||
```bash
|
||||
# load=0: load PyTorch model, load=1: load transformers model
|
||||
python eval_model.py --load 1 --model_mode 2
|
||||
# Use transformers format model
|
||||
python eval_llm.py --load_from ./MiniMind2
|
||||
```
|
||||
|
||||
**Model Modes**:
|
||||
- `model_mode 0`: Pretrain model (word continuation)
|
||||
- `model_mode 1`: SFT Chat model (conversation)
|
||||
- `model_mode 2`: RLHF model (refined responses, currently same as SFT for small models)
|
||||
- `model_mode 3`: Reasoning model (with thinking chains)
|
||||
- `model_mode 4/5`: RLAIF models (PPO/GRPO trained)
|
||||
**Weight Options** (`--weight` parameter):
|
||||
- `pretrain`: Pretrain model (word continuation)
|
||||
- `full_sft`: SFT Chat model (conversation)
|
||||
- `dpo`: DPO model (preference optimization)
|
||||
- `reason`: Reasoning model (with thinking chains)
|
||||
- `ppo_actor`, `grpo`, `spo`: RLAIF models (reinforcement learning trained)
|
||||
|
||||
**Example Session**:
|
||||
```text
|
||||
@ -103,7 +103,7 @@ Visit `http://localhost:8501` to use the interactive web interface.
|
||||
Extend context length beyond training with RoPE extrapolation:
|
||||
|
||||
```bash
|
||||
python eval_model.py --inference_rope_scaling True
|
||||
python eval_llm.py --weight full_sft --inference_rope_scaling
|
||||
```
|
||||
|
||||
This enables the YaRN algorithm to handle sequences longer than the 2K training context, useful for processing documents and long conversations.
|
||||
@ -227,10 +227,10 @@ A: 珠穆朗玛峰(Mount Everest)是世界上最高的山峰,位于喜马
|
||||
**Solution**:
|
||||
```bash
|
||||
# Reduce batch size
|
||||
python eval_model.py --batch_size 1
|
||||
python eval_llm.py --batch_size 1
|
||||
|
||||
# Or use CPU (slow but works)
|
||||
python eval_model.py --device cpu
|
||||
python eval_llm.py --device cpu
|
||||
```
|
||||
|
||||
### Issue: Slow Inference
|
||||
@ -244,7 +244,7 @@ python eval_model.py --device cpu
|
||||
### Issue: Model Responses Are Poor Quality
|
||||
|
||||
**Possible Causes**:
|
||||
- Using pretrain model (`model_mode 0`) instead of SFT (`model_mode 1`)
|
||||
- Using pretrain model (`--weight pretrain`) instead of SFT (`--weight full_sft`)
|
||||
- Model is undertrained - download the full checkpoint instead
|
||||
- Input prompt is too short - provide more context
|
||||
|
||||
|
||||
@ -54,7 +54,7 @@ cd dataset
|
||||
├── sft_512.jsonl (7.5GB, standard SFT)
|
||||
├── sft_1024.jsonl (5.6GB, longer SFT)
|
||||
├── sft_2048.jsonl (9GB, very long SFT)
|
||||
├── dpo.jsonl (909MB, DPO training)
|
||||
├── dpo.jsonl ✨ (55MB, DPO training - optimized and simplified)
|
||||
├── r1_mix_1024.jsonl (340MB, reasoning distillation)
|
||||
├── rlaif-mini.jsonl (1MB, RLAIF algorithms)
|
||||
├── lora_identity.jsonl (22.8KB, identity LoRA)
|
||||
@ -112,6 +112,25 @@ All training scripts are in the `./trainer` directory.
|
||||
cd trainer
|
||||
```
|
||||
|
||||
!!! info "💡 Checkpoint Resume Training"
|
||||
All training scripts automatically save checkpoints. Simply add `--from_resume 1` parameter to automatically detect, load & resume training:
|
||||
|
||||
```bash
|
||||
python train_pretrain.py --from_resume 1
|
||||
python train_full_sft.py --from_resume 1
|
||||
python train_dpo.py --from_resume 1
|
||||
# ... and all other training scripts
|
||||
```
|
||||
|
||||
**Checkpoint Resume Mechanism:**
|
||||
|
||||
- Training process automatically saves complete checkpoints in `./checkpoints/` directory (model, optimizer, training progress, etc.)
|
||||
- Checkpoint file naming: `<weight_name>_<dimension>_resume.pth` (e.g., `full_sft_512_resume.pth`)
|
||||
- Supports cross-GPU recovery (automatically adjusts step)
|
||||
- Supports wandb training log continuity (automatically resumes the same run)
|
||||
|
||||
> Suitable for long training sessions or unstable environments, no need to worry about progress loss from interruptions
|
||||
|
||||
### Stage 1: Pretraining
|
||||
|
||||
**Purpose**: Learn foundational knowledge (word continuation)
|
||||
@ -234,7 +253,7 @@ python train_dpo.py
|
||||
torchrun --nproc_per_node 2 train_dpo.py
|
||||
```
|
||||
|
||||
**Output**: `./out/rlhf_*.pth`
|
||||
**Output**: `./out/dpo_*.pth`
|
||||
|
||||
**Key Features**:
|
||||
- Off-policy training (reuse data across epochs)
|
||||
@ -439,35 +458,38 @@ python train_xxx.py --use_wandb # Automatically uses SwanLab if available
|
||||
### Evaluate Pretrain Model
|
||||
|
||||
```bash
|
||||
python eval_model.py --model_mode 0
|
||||
python eval_llm.py --weight pretrain
|
||||
```
|
||||
|
||||
### Evaluate Chat Model
|
||||
|
||||
```bash
|
||||
python eval_model.py --model_mode 1
|
||||
python eval_llm.py --weight full_sft
|
||||
```
|
||||
|
||||
### Evaluate with LoRA
|
||||
|
||||
```bash
|
||||
python eval_model.py --lora_name 'lora_medical' --model_mode 1
|
||||
python eval_llm.py --weight dpo --lora_weight lora_medical
|
||||
```
|
||||
|
||||
### Evaluate Reasoning Model
|
||||
|
||||
```bash
|
||||
python eval_model.py --model_mode 3
|
||||
python eval_llm.py --weight reason
|
||||
```
|
||||
|
||||
### Evaluate RLAIF Models
|
||||
|
||||
```bash
|
||||
# PPO model
|
||||
python eval_model.py --model_mode 4
|
||||
python eval_llm.py --weight ppo_actor
|
||||
|
||||
# GRPO model
|
||||
python eval_model.py --model_mode 4
|
||||
python eval_llm.py --weight grpo
|
||||
|
||||
# SPO model
|
||||
python eval_llm.py --weight spo
|
||||
```
|
||||
|
||||
### RoPE Length Extrapolation
|
||||
@ -475,7 +497,7 @@ python eval_model.py --model_mode 4
|
||||
Test with extended context:
|
||||
|
||||
```bash
|
||||
python eval_model.py --model_mode 1 --inference_rope_scaling True
|
||||
python eval_llm.py --weight full_sft --inference_rope_scaling
|
||||
```
|
||||
|
||||
## 📐 Model Architecture
|
||||
|
||||
Loading…
Reference in New Issue
Block a user