TensorRT-LLMs/examples/visual_gen/README.md
Chang Liu 26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462)
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00

173 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Visual Generation Examples
Quick reference for running visual generation models (WAN).
## Prerequisites
```bash
# Install dependencies (from repository root)
pip install -r requirements-dev.txt
pip install git+https://github.com/huggingface/diffusers.git
pip install av
```
## Quick Start
```bash
# Set MODEL_ROOT to your model directory (required for examples)
export MODEL_ROOT=/llm-models
# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen
# Run all examples (auto-detects GPUs)
cd examples/visual_gen
./visual_gen_examples.sh
```
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `PROJECT_ROOT` | Auto-detected | Path to repository root (set when running from `examples/visual_gen`) |
| `MODEL_ROOT` | `/llm-models` | Path to model directory |
| `TLLM_LOG_LEVEL` | `INFO` | Logging level |
---
## WAN (Text-to-Video)
### Basic Usage
**Single GPU:**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--output_path output.mp4
```
**With TeaCache:**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--enable_teacache \
--output_path output.mp4
```
### Multi-GPU Parallelism
WAN supports two parallelism modes that can be combined:
- **CFG Parallelism**: Split positive/negative prompts across GPUs
- **Ulysses Parallelism**: Split sequence across GPUs for longer sequences
**Ulysses Only (2 GPUs):**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--attention_backend TRTLLM \
--cfg_size 1 --ulysses_size 2 \
--output_path output.mp4
```
GPU Layout: GPU 0-1 share sequence (6 heads each)
**CFG Only (2 GPUs):**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--attention_backend TRTLLM \
--cfg_size 2 --ulysses_size 1 \
--output_path output.mp4
```
GPU Layout: GPU 0 (positive) | GPU 1 (negative)
**CFG + Ulysses (4 GPUs):**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--attention_backend TRTLLM \
--cfg_size 2 --ulysses_size 2 \
--output_path output.mp4
```
GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)
**Large-Scale (8 GPUs):**
```bash
python visual_gen_wan_t2v.py \
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A cute cat playing piano" \
--height 480 --width 832 --num_frames 33 \
--attention_backend TRTLLM \
--cfg_size 2 --ulysses_size 4 \
--output_path output.mp4
```
GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)
---
## Common Arguments
| Argument | WAN | Default | Description |
|----------|-----|---------|-------------|
| `--height` | ✓ | 720 | Output height |
| `--width` | ✓ | 1280 | Output width |
| `--num_frames` | ✓ | 81 | Number of frames |
| `--steps` | ✓ | 50 | Denoising steps |
| `--guidance_scale` | ✓ | 5.0 | CFG guidance strength |
| `--seed` | ✓ | 42 | Random seed |
| `--enable_teacache` | ✓ | False | Cache optimization |
| `--teacache_thresh` | ✓ | 0.2 | TeaCache similarity threshold |
| `--attention_backend` | ✓ | VANILLA | VANILLA or TRTLLM |
| `--cfg_size` | ✓ | 1 | CFG parallelism |
| `--ulysses_size` | ✓ | 1 | Sequence parallelism |
| `--linear_type` | ✓ | default | Quantization type |
## Troubleshooting
**Out of Memory:**
- Use quantization: `--linear_type trtllm-fp8-blockwise`
- Reduce resolution or frames
- Enable TeaCache: `--enable_teacache`
- Use Ulysses parallelism with more GPUs
**Slow Inference:**
- Enable TeaCache: `--enable_teacache`
- Use TRTLLM backend: `--attention_backend TRTLLM`
- Use multi-GPU: `--cfg_size 2` or `--ulysses_size 2`
**Import Errors:**
- Run from repository root
- Install necessary dependencies, e.g., `pip install -r requirements-dev.txt`
**Ulysses Errors:**
- `ulysses_size` must divide 12 (WAN heads)
- Total GPUs = `cfg_size × ulysses_size`
- Sequence length must be divisible by `ulysses_size`
## Output Formats
- **WAN**: `.mp4` (video), `.gif` (animated), `.png` (single frame)
## Baseline Validation
Compare with official HuggingFace Diffusers implementation:
```bash
# Run HuggingFace baselines
./hf_examples.sh
# Or run individual models
python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers
```
Compare outputs with same seed for correctness verification.