mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

bhsueh_NV 322ac565fc chore: clean some ci of qa test (#3083 ) * move some models to examples/models/contrib Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * update the document Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove arctic, blip2, cogvlm, dbrx from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove tests of dit, mmdit and stdit from qa test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove grok, jais, sdxl, skywork, smaug from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * re-organize the glm examples Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix issues after running pre-commit Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix some typo in glm_4_9b readme Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>		2025-03-31 14:30:41 +08:00
..
assets	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
aspect.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
convert_checkpoint.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
pipeline_tllm.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
README.md	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
requirements.txt	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
sample.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
scheduler.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
text_encoder.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
utils.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
vae.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
video_transforms.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00

README.md

STDiT in OpenSoRA

This document shows how to build and run a STDiT in OpenSoRA with TensorRT-LLM.

Overview

The TensorRT-LLM implementation of STDiT can be found in tensorrt_llm/models/stdit/model.py. The TensorRT-LLM STDiT (OpenSoRA) example code is located in examples/models/contrib/stdit. There are main files to build and run STDiT with TensorRT-LLM:

convert_checkpoint.py to convert the STDiT model into tensorrt-llm checkpoint format.
sample.py to run the pipeline with TensorRT engine(s) to generate videos.

Support Matrix

Usage

The TensorRT-LLM STDiT example code locates at examples/models/contrib/stdit. It takes HuggingFace checkpoint as input, and builds the corresponding TensorRT engines. The number of TensorRT engines depends on the number of GPUs used to run inference.

Requirements

Please install required packages first:

pip install -r requirements.txt
# ColossalAI is also needed for text encoder.
pip install colossalai --no-deps

Build STDiT TensorRT engine(s)

This checkpoint will be converted to the TensorRT-LLM checkpoint format by convert_checkpoint.py. After that, we can build TensorRT engine(s) with the TensorRT-LLM checkpoint. The pretrained checkpoint can be downloaded from here.

# Convert to TRT-LLM
python convert_checkpoint.py --timm_ckpt=<pretrained_checkpoint>
# Build engine
trtllm-build --checkpoint_dir=tllm_checkpoint/ \
             --max_batch_size=2 \
             --gemm_plugin=float16 \
             --kv_cache_type=disabled \
             --remove_input_padding=enable \
             --gpt_attention_plugin=auto \
             --bert_attention_plugin=auto \
             --context_fmha=enable

After build, we can find a ./engine_output directory, it is ready for running STDiT model with TensorRT-LLM now.

Generate videos

A sample.py is provided to generated videos with the optimized TensorRT engines.

python sample.py "a beautiful waterfall"

And we can see a video named sample_outputs/sample_0000.mp4 will be generated:

Tensor Parallel

We can levaerage tensor parallel to further reduce latency and memory consumption on each GPU.

# Convert to TRT-LLM
python convert_checkpoint.py --tp_size=2 --timm_ckpt=<pretrained_checkpoint>
# Build engines
trtllm-build --checkpoint_dir=tllm_checkpoint/ \
             --max_batch_size=2 \
             --gemm_plugin=float16 \
             --kv_cache_type=disabled \
             --remove_input_padding=enable \
             --gpt_attention_plugin=auto \
             --bert_attention_plugin=auto \
             --context_fmha=enable
# Run example
mpirun -n 2 --allow-run-as-root python sample.py "a beautiful waterfall"

Context Parallel

Not supported yet.