mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-17 00:04:57 +08:00

[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462 )

Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>

2026-02-14 06:11:11 +08:00

4.8 KiB

Raw Blame History

Visual Generation Examples

Quick reference for running visual generation models (WAN).

Prerequisites

# Install dependencies (from repository root)
pip install -r requirements-dev.txt
pip install git+https://github.com/huggingface/diffusers.git
pip install av

Quick Start

# Set MODEL_ROOT to your model directory (required for examples)
export MODEL_ROOT=/llm-models
# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen

# Run all examples (auto-detects GPUs)
cd examples/visual_gen
./visual_gen_examples.sh

Environment Variables

Variable	Default	Description
`PROJECT_ROOT`	Auto-detected	Path to repository root (set when running from `examples/visual_gen`)
`MODEL_ROOT`	`/llm-models`	Path to model directory
`TLLM_LOG_LEVEL`	`INFO`	Logging level

WAN (Text-to-Video)

Basic Usage

Single GPU:

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --output_path output.mp4

With TeaCache:

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --enable_teacache \
    --output_path output.mp4

Multi-GPU Parallelism

WAN supports two parallelism modes that can be combined:

CFG Parallelism: Split positive/negative prompts across GPUs
Ulysses Parallelism: Split sequence across GPUs for longer sequences

Ulysses Only (2 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 1 --ulysses_size 2 \
    --output_path output.mp4

GPU Layout: GPU 0-1 share sequence (6 heads each)

CFG Only (2 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 1 \
    --output_path output.mp4

GPU Layout: GPU 0 (positive) | GPU 1 (negative)

CFG + Ulysses (4 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 2 \
    --output_path output.mp4

GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)

Large-Scale (8 GPUs):

python visual_gen_wan_t2v.py \
    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "A cute cat playing piano" \
    --height 480 --width 832 --num_frames 33 \
    --attention_backend TRTLLM \
    --cfg_size 2 --ulysses_size 4 \
    --output_path output.mp4

GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)

Common Arguments

Argument	WAN	Default	Description
`--height`	✓	720	Output height
`--width`	✓	1280	Output width
`--num_frames`	✓	81	Number of frames
`--steps`	✓	50	Denoising steps
`--guidance_scale`	✓	5.0	CFG guidance strength
`--seed`	✓	42	Random seed
`--enable_teacache`	✓	False	Cache optimization
`--teacache_thresh`	✓	0.2	TeaCache similarity threshold
`--attention_backend`	✓	VANILLA	VANILLA or TRTLLM
`--cfg_size`	✓	1	CFG parallelism
`--ulysses_size`	✓	1	Sequence parallelism
`--linear_type`	✓	default	Quantization type

Troubleshooting

Out of Memory:

Use quantization: --linear_type trtllm-fp8-blockwise
Reduce resolution or frames
Enable TeaCache: --enable_teacache
Use Ulysses parallelism with more GPUs

Slow Inference:

Enable TeaCache: --enable_teacache
Use TRTLLM backend: --attention_backend TRTLLM
Use multi-GPU: --cfg_size 2 or --ulysses_size 2

Import Errors:

Run from repository root
Install necessary dependencies, e.g., pip install -r requirements-dev.txt

Ulysses Errors:

ulysses_size must divide 12 (WAN heads)
Total GPUs = cfg_size × ulysses_size
Sequence length must be divisible by ulysses_size

Output Formats

WAN: .mp4 (video), .gif (animated), .png (single frame)

Baseline Validation

Compare with official HuggingFace Diffusers implementation:

# Run HuggingFace baselines
./hf_examples.sh

# Or run individual models
python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers

Compare outputs with same seed for correctness verification.

4.8 KiB Raw Blame History Unescape Escape