mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-17 00:04:57 +08:00
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com> Co-authored-by: Freddy Qi <junq@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
173 lines
4.8 KiB
Markdown
173 lines
4.8 KiB
Markdown
# Visual Generation Examples
|
||
|
||
Quick reference for running visual generation models (WAN).
|
||
|
||
## Prerequisites
|
||
|
||
```bash
|
||
# Install dependencies (from repository root)
|
||
pip install -r requirements-dev.txt
|
||
pip install git+https://github.com/huggingface/diffusers.git
|
||
pip install av
|
||
```
|
||
|
||
## Quick Start
|
||
|
||
```bash
|
||
# Set MODEL_ROOT to your model directory (required for examples)
|
||
export MODEL_ROOT=/llm-models
|
||
# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen
|
||
|
||
# Run all examples (auto-detects GPUs)
|
||
cd examples/visual_gen
|
||
./visual_gen_examples.sh
|
||
```
|
||
|
||
|
||
## Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `PROJECT_ROOT` | Auto-detected | Path to repository root (set when running from `examples/visual_gen`) |
|
||
| `MODEL_ROOT` | `/llm-models` | Path to model directory |
|
||
| `TLLM_LOG_LEVEL` | `INFO` | Logging level |
|
||
|
||
---
|
||
|
||
## WAN (Text-to-Video)
|
||
|
||
### Basic Usage
|
||
|
||
**Single GPU:**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--output_path output.mp4
|
||
```
|
||
|
||
**With TeaCache:**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--enable_teacache \
|
||
--output_path output.mp4
|
||
```
|
||
|
||
### Multi-GPU Parallelism
|
||
|
||
WAN supports two parallelism modes that can be combined:
|
||
- **CFG Parallelism**: Split positive/negative prompts across GPUs
|
||
- **Ulysses Parallelism**: Split sequence across GPUs for longer sequences
|
||
|
||
|
||
**Ulysses Only (2 GPUs):**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--attention_backend TRTLLM \
|
||
--cfg_size 1 --ulysses_size 2 \
|
||
--output_path output.mp4
|
||
```
|
||
GPU Layout: GPU 0-1 share sequence (6 heads each)
|
||
|
||
**CFG Only (2 GPUs):**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--attention_backend TRTLLM \
|
||
--cfg_size 2 --ulysses_size 1 \
|
||
--output_path output.mp4
|
||
```
|
||
GPU Layout: GPU 0 (positive) | GPU 1 (negative)
|
||
|
||
**CFG + Ulysses (4 GPUs):**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--attention_backend TRTLLM \
|
||
--cfg_size 2 --ulysses_size 2 \
|
||
--output_path output.mp4
|
||
```
|
||
GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)
|
||
|
||
**Large-Scale (8 GPUs):**
|
||
```bash
|
||
python visual_gen_wan_t2v.py \
|
||
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
|
||
--prompt "A cute cat playing piano" \
|
||
--height 480 --width 832 --num_frames 33 \
|
||
--attention_backend TRTLLM \
|
||
--cfg_size 2 --ulysses_size 4 \
|
||
--output_path output.mp4
|
||
```
|
||
GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)
|
||
|
||
---
|
||
|
||
## Common Arguments
|
||
|
||
| Argument | WAN | Default | Description |
|
||
|----------|-----|---------|-------------|
|
||
| `--height` | ✓ | 720 | Output height |
|
||
| `--width` | ✓ | 1280 | Output width |
|
||
| `--num_frames` | ✓ | 81 | Number of frames |
|
||
| `--steps` | ✓ | 50 | Denoising steps |
|
||
| `--guidance_scale` | ✓ | 5.0 | CFG guidance strength |
|
||
| `--seed` | ✓ | 42 | Random seed |
|
||
| `--enable_teacache` | ✓ | False | Cache optimization |
|
||
| `--teacache_thresh` | ✓ | 0.2 | TeaCache similarity threshold |
|
||
| `--attention_backend` | ✓ | VANILLA | VANILLA or TRTLLM |
|
||
| `--cfg_size` | ✓ | 1 | CFG parallelism |
|
||
| `--ulysses_size` | ✓ | 1 | Sequence parallelism |
|
||
| `--linear_type` | ✓ | default | Quantization type |
|
||
|
||
## Troubleshooting
|
||
|
||
**Out of Memory:**
|
||
- Use quantization: `--linear_type trtllm-fp8-blockwise`
|
||
- Reduce resolution or frames
|
||
- Enable TeaCache: `--enable_teacache`
|
||
- Use Ulysses parallelism with more GPUs
|
||
|
||
**Slow Inference:**
|
||
- Enable TeaCache: `--enable_teacache`
|
||
- Use TRTLLM backend: `--attention_backend TRTLLM`
|
||
- Use multi-GPU: `--cfg_size 2` or `--ulysses_size 2`
|
||
|
||
**Import Errors:**
|
||
- Run from repository root
|
||
- Install necessary dependencies, e.g., `pip install -r requirements-dev.txt`
|
||
|
||
**Ulysses Errors:**
|
||
- `ulysses_size` must divide 12 (WAN heads)
|
||
- Total GPUs = `cfg_size × ulysses_size`
|
||
- Sequence length must be divisible by `ulysses_size`
|
||
|
||
## Output Formats
|
||
|
||
- **WAN**: `.mp4` (video), `.gif` (animated), `.png` (single frame)
|
||
|
||
## Baseline Validation
|
||
|
||
Compare with official HuggingFace Diffusers implementation:
|
||
|
||
```bash
|
||
# Run HuggingFace baselines
|
||
./hf_examples.sh
|
||
|
||
# Or run individual models
|
||
python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers
|
||
```
|
||
|
||
Compare outputs with same seed for correctness verification.
|