TensorRT-LLMs/examples/visual_gen
Chang Liu 26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462)
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00
..
serve [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
cat_piano.png [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
hf_examples.sh [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
hf_wan.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
output_handler.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
README.md [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
visual_gen_examples.sh [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
visual_gen_wan_i2v.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
visual_gen_wan_t2v.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00

Visual Generation Examples

Quick reference for running visual generation models (WAN).

Prerequisites

# Install dependencies (from repository root)
pip install -r requirements-dev.txt
pip install git+https://github.com/huggingface/diffusers.git
pip install av

Quick Start

# Set MODEL_ROOT to your model directory (required for examples)
export MODEL_ROOT=/llm-models
# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen

# Run all examples (auto-detects GPUs)
cd examples/visual_gen
./visual_gen_examples.sh

Environment Variables

Variable Default Description
PROJECT_ROOT Auto-detected Path to repository root (set when running from examples/visual_gen)
MODEL_ROOT /llm-models Path to model directory
TLLM_LOG_LEVEL INFO Logging level

WAN (Text-to-Video)

Basic Usage

Single GPU:

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --output_path output.mp4

With TeaCache:

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --enable_teacache \
    --output_path output.mp4

Multi-GPU Parallelism

WAN supports two parallelism modes that can be combined:

  • CFG Parallelism: Split positive/negative prompts across GPUs
  • Ulysses Parallelism: Split sequence across GPUs for longer sequences

Ulysses Only (2 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 1 --ulysses_size 2 \
    --output_path output.mp4

GPU Layout: GPU 0-1 share sequence (6 heads each)

CFG Only (2 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 1 \
    --output_path output.mp4

GPU Layout: GPU 0 (positive) | GPU 1 (negative)

CFG + Ulysses (4 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 2 \
    --output_path output.mp4

GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)

Large-Scale (8 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 4 \
    --output_path output.mp4

GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)


Common Arguments

Argument WAN Default Description
--height 720 Output height
--width 1280 Output width
--num_frames 81 Number of frames
--steps 50 Denoising steps
--guidance_scale 5.0 CFG guidance strength
--seed 42 Random seed
--enable_teacache False Cache optimization
--teacache_thresh 0.2 TeaCache similarity threshold
--attention_backend VANILLA VANILLA or TRTLLM
--cfg_size 1 CFG parallelism
--ulysses_size 1 Sequence parallelism
--linear_type default Quantization type

Troubleshooting

Out of Memory:

  • Use quantization: --linear_type trtllm-fp8-blockwise
  • Reduce resolution or frames
  • Enable TeaCache: --enable_teacache
  • Use Ulysses parallelism with more GPUs

Slow Inference:

  • Enable TeaCache: --enable_teacache
  • Use TRTLLM backend: --attention_backend TRTLLM
  • Use multi-GPU: --cfg_size 2 or --ulysses_size 2

Import Errors:

  • Run from repository root
  • Install necessary dependencies, e.g., pip install -r requirements-dev.txt

Ulysses Errors:

  • ulysses_size must divide 12 (WAN heads)
  • Total GPUs = cfg_size × ulysses_size
  • Sequence length must be divisible by ulysses_size

Output Formats

  • WAN: .mp4 (video), .gif (animated), .png (single frame)

Baseline Validation

Compare with official HuggingFace Diffusers implementation:

# Run HuggingFace baselines
./hf_examples.sh

# Or run individual models
python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers

Compare outputs with same seed for correctness verification.