# Visual Generation Examples Quick reference for running visual generation models (WAN). ## Prerequisites ```bash # Install dependencies (from repository root) pip install -r requirements-dev.txt pip install git+https://github.com/huggingface/diffusers.git pip install av ``` ## Quick Start ```bash # Set MODEL_ROOT to your model directory (required for examples) export MODEL_ROOT=/llm-models # Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen # Run all examples (auto-detects GPUs) cd examples/visual_gen ./visual_gen_examples.sh ``` ## Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `PROJECT_ROOT` | Auto-detected | Path to repository root (set when running from `examples/visual_gen`) | | `MODEL_ROOT` | `/llm-models` | Path to model directory | | `TLLM_LOG_LEVEL` | `INFO` | Logging level | --- ## WAN (Text-to-Video) ### Basic Usage **Single GPU:** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --output_path output.mp4 ``` **With TeaCache:** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --enable_teacache \ --output_path output.mp4 ``` ### Multi-GPU Parallelism WAN supports two parallelism modes that can be combined: - **CFG Parallelism**: Split positive/negative prompts across GPUs - **Ulysses Parallelism**: Split sequence across GPUs for longer sequences **Ulysses Only (2 GPUs):** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --attention_backend TRTLLM \ --cfg_size 1 --ulysses_size 2 \ --output_path output.mp4 ``` GPU Layout: GPU 0-1 share sequence (6 heads each) **CFG Only (2 GPUs):** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --attention_backend TRTLLM \ --cfg_size 2 --ulysses_size 1 \ --output_path output.mp4 ``` GPU Layout: GPU 0 (positive) | GPU 1 (negative) **CFG + Ulysses (4 GPUs):** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --attention_backend TRTLLM \ --cfg_size 2 --ulysses_size 2 \ --output_path output.mp4 ``` GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses) **Large-Scale (8 GPUs):** ```bash python visual_gen_wan_t2v.py \ --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \ --prompt "A cute cat playing piano" \ --height 480 --width 832 --num_frames 33 \ --attention_backend TRTLLM \ --cfg_size 2 --ulysses_size 4 \ --output_path output.mp4 ``` GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative) --- ## Common Arguments | Argument | WAN | Default | Description | |----------|-----|---------|-------------| | `--height` | ✓ | 720 | Output height | | `--width` | ✓ | 1280 | Output width | | `--num_frames` | ✓ | 81 | Number of frames | | `--steps` | ✓ | 50 | Denoising steps | | `--guidance_scale` | ✓ | 5.0 | CFG guidance strength | | `--seed` | ✓ | 42 | Random seed | | `--enable_teacache` | ✓ | False | Cache optimization | | `--teacache_thresh` | ✓ | 0.2 | TeaCache similarity threshold | | `--attention_backend` | ✓ | VANILLA | VANILLA or TRTLLM | | `--cfg_size` | ✓ | 1 | CFG parallelism | | `--ulysses_size` | ✓ | 1 | Sequence parallelism | | `--linear_type` | ✓ | default | Quantization type | ## Troubleshooting **Out of Memory:** - Use quantization: `--linear_type trtllm-fp8-blockwise` - Reduce resolution or frames - Enable TeaCache: `--enable_teacache` - Use Ulysses parallelism with more GPUs **Slow Inference:** - Enable TeaCache: `--enable_teacache` - Use TRTLLM backend: `--attention_backend TRTLLM` - Use multi-GPU: `--cfg_size 2` or `--ulysses_size 2` **Import Errors:** - Run from repository root - Install necessary dependencies, e.g., `pip install -r requirements-dev.txt` **Ulysses Errors:** - `ulysses_size` must divide 12 (WAN heads) - Total GPUs = `cfg_size × ulysses_size` - Sequence length must be divisible by `ulysses_size` ## Output Formats - **WAN**: `.mp4` (video), `.gif` (animated), `.png` (single frame) ## Baseline Validation Compare with official HuggingFace Diffusers implementation: ```bash # Run HuggingFace baselines ./hf_examples.sh # Or run individual models python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers ``` Compare outputs with same seed for correctness verification.