TensorRT-LLMs/examples/visual_gen/serve
Chang Liu 26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462)
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00
..
configs [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
media [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
async_video_gen.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
delete_video.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
README.md [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
sync_image_gen.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00
sync_video_gen.py [TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462) 2026-02-14 06:11:11 +08:00

Visual Generation API Examples

This directory contains example scripts that demonstrate how to use the TensorRT-LLM Visual Generation API endpoints for image and video generation.

Overview

These examples show how to interact with the visual generation server using both the OpenAI Python SDK and standard HTTP requests. The API provides endpoints for:

  • Image Generation: Text-to-image generation (T2I)
  • Video Generation:
    • Text-to-video generation (T2V) - generate videos from text prompts only
    • Text+Image-to-video generation (TI2V) - generate videos from text + reference image
    • Both synchronous and asynchronous modes supported
    • Multipart/form-data support for file uploads
  • Video Management: Retrieving and deleting generated videos

Prerequisites

Before running these examples, ensure you have:

  1. Install modules: Install required dependencies before running examples:

    pip install git+https://github.com/huggingface/diffusers.git
    pip install av
    
  2. Server Running: The TensorRT-LLM visual generation server must be running

    trtllm-serve <path to your model> --extra_visual_gen_options <path to config yaml>
    

    e.g.

    trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml
    
    # Run server on background:
    trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml > /tmp/serve.log 2>&1 &
    
    ## Check if the server is setup
    tail -f /tmp/serve.log
    
    

Examples

Current supported & tested models:

  1. WAN T2V/I2V for video generation (t2v, ti2v, delete_video)

1. Synchronous Image Generation (sync_t2i.py)

Demonstrates synchronous text-to-image generation using the OpenAI SDK.

Features:

  • Generates images from text prompts
  • Supports configurable image size and quality
  • Returns base64-encoded images or URLs
  • Saves generated images to disk

Usage:

# Use default localhost server
python sync_image_gen.py

# Specify custom server URL
python sync_image_gen.py http://your-server:8000/v1

API Endpoint: POST /v1/images/generations

Output: Saves generated image to output_generation.png (or numbered files for multiple images)


2. Synchronous Video Generation with T2V and TI2V Modes (sync_video_gen.py)

Demonstrates synchronous video generation using direct HTTP requests. Waits for completion and returns the video file directly.

Features:

  • T2V Mode: Generate videos from text prompts only
  • TI2V Mode: Generate videos from text + reference image (multipart/form-data)
  • Waits for video generation to complete before returning
  • Returns video file directly in response
  • Command-line interface for easy testing

Usage:

# Text-to-Video (T2V) - No reference image
python sync_video_gen.py --mode t2v \
    --prompt "A cute cat playing with a ball in the park" \
    --duration 4.0 --fps 24 --size 256x256

# Text+Image-to-Video (TI2V) - With reference image
## Note: longer duration and higher size will lead to much longer waiting time
python sync_video_gen.py --mode ti2v \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 4.0 --fps 24 --size 512x512

# Custom parameters
python sync_video_gen.py --mode t2v \
    --prompt "A serene sunset over the ocean" \
    --duration 5.0 --fps 30 --size 512x512 \
    --output my_video.mp4

Command-Line Arguments:

  • --mode - Generation mode: t2v or ti2v (default: t2v)
  • --prompt - Text prompt for video generation (required)
  • --image - Path to reference image (required for ti2v mode)
  • --base-url - API server URL (default: http://localhost:8000/v1)
  • --model - Model name (default: wan)
  • --duration - Video duration in seconds (default: 4.0)
  • --fps - Frames per second (default: 24)
  • --size - Video resolution in WxH format (default: 256x256)
  • --output - Output video file path (default: output_sync.mp4)

API Endpoint: POST /v1/videos/generations

API Details:

  • T2V uses JSON Content-Type: application/json
  • TI2V uses multipart/form-data Content-Type: multipart/form-data with file upload

Output: Saves generated video to specified output file


3. Async Video Generation with T2V and TI2V Modes (async_video_gen.py)

NEW: Enhanced async video generation supporting both Text-to-Video (T2V) and Text+Image-to-Video (TI2V) modes.

Features:

  • T2V Mode: Generate videos from text prompts only (JSON request)
  • TI2V Mode: Generate videos from text + reference image (multipart/form-data with file upload)
  • Command-line interface for easy testing
  • Automatic mode detection
  • Comprehensive parameter control

Usage:

# Text-to-Video (T2V) - No reference image
python async_video_gen.py --mode t2v \
    --prompt "A cool cat on a motorcycle in the night" \
    --duration 4.0 --fps 24 --size 256x256

# Text+Image-to-Video (TI2V) - With reference image
python async_video_gen.py --mode ti2v \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 4.0 --fps 24 --size 512x512

# Custom parameters
python async_video_gen.py --mode t2v \
    --prompt "A serene sunset over the ocean" \
    --duration 5.0 --fps 30 --size 512x512 \
    --output my_video.mp4

Command-Line Arguments:

  • --mode - Generation mode: t2v or ti2v (default: t2v)
  • --prompt - Text prompt for video generation (required)
  • --image - Path to reference image (required for ti2v mode)
  • --base-url - API server URL (default: http://localhost:8000/v1)
  • --model - Model name (default: wan)
  • --duration - Video duration in seconds (default: 4.0)
  • --fps - Frames per second (default: 24)
  • --size - Video resolution in WxH format (default: 256x256)
  • --output - Output video file path (default: output_async.mp4)

API Details:

  • T2V uses JSON Content-Type: application/json
  • TI2V uses multipart/form-data Content-Type: multipart/form-data with file upload

Output: Saves generated video to specified output file


4. Video Deletion (delete_video.py)

Demonstrates the complete lifecycle of video generation and deletion.

Features:

  • Creates a test video generation job
  • Waits for completion
  • Deletes the generated video
  • Verifies deletion by attempting to retrieve the deleted video
  • Tests error handling for non-existent videos

Usage:

# Use default localhost server
python delete_video.py

# Specify custom server URL
python delete_video.py http://your-server:8000/v1

API Endpoints:

  • POST /v1/videos - Create video job
  • GET /v1/videos/{video_id} - Check video status
  • DELETE /v1/videos/{video_id} - Delete video

Test Flow:

  1. Create video generation job
  2. Wait for completion
  3. Delete the video
  4. Verify video returns NotFoundError
  5. Test deletion of non-existent video

API Configuration

All examples use the following default configuration:

  • Base URL: http://localhost:8000/v1
  • API Key: "tensorrt_llm" (authentication token)
  • Timeout: 300 seconds for async operations

You can customize these by:

  1. Passing the base URL as a command-line argument
  2. Modifying the default parameters in each script's function

Common Parameters

Image Generation

  • model: Model identifier (e.g., "wan")
  • prompt: Text description
  • n: Number of images to generate
  • size: Image dimensions (e.g., "512x512", "1024x1024")
  • quality: "standard" or "hd"
  • response_format: "b64_json" or "url"

Video Generation

  • model: Model identifier (e.g., "wan")
  • prompt: Text description
  • size: Video resolution (e.g., "256x256", "512x512")
  • seconds: Duration in seconds
  • fps: Frames per second
  • input_reference: Reference image file (for TI2V mode)

Quick Reference - curl Examples

Text-to-Video (JSON)

curl -X POST "http://localhost:8000/v1/videos" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A cool cat on a motorcycle",
    "seconds": 4.0,
    "fps": 24,
    "size": "256x256"
  }'

Text+Image-to-Video (Multipart with File Upload)

curl -X POST "http://localhost:8000/v1/videos" \
  -F "prompt=She turns around and smiles" \
  -F "input_reference=@./media/woman_skyline_original_720p.jpeg" \
  -F "seconds=4.0" \
  -F "fps=24" \
  -F "size=256x256" \
  -F "guidance_scale=5.0"

Check Video Status

curl -X GET "http://localhost:8000/v1/videos/{video_id}"

Download Video

curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -o output.mp4

Delete Video

curl -X DELETE "http://localhost:8000/v1/videos/{video_id}"

API Endpoints Summary

Endpoint Method Mode Content-Type Purpose
/v1/videos POST Async JSON or Multipart Create video job (T2V/TI2V)
/v1/videos/generations POST Sync JSON or Multipart Generate video sync (T2V/TI2V)
/v1/videos/{id} GET - - Get video status/metadata
/v1/videos/{id}/content GET - - Download video file
/v1/videos/{id} DELETE - - Delete video
/v1/videos GET - - List all videos
/v1/images/generations POST - JSON Generate images (T2I)

Note: Both /v1/videos (async) and /v1/videos/generations (sync) support:

  • JSON: Standard text-to-video (T2V)
  • Multipart/Form-Data: Text+image-to-video (TI2V) with file upload

Error Handling

All examples include comprehensive error handling:

  • Connection errors (server not running)
  • API errors (invalid parameters, model not found)
  • Timeout errors (generation taking too long)
  • Resource errors (video not found for deletion)

Errors are displayed with full stack traces for debugging.

Output Files

Generated files are saved to the current working directory:

  • output_generation.png - Synchronous image generation (sync_image_gen.py)
  • output_sync.mp4 - Synchronous video generation (sync_video_gen.py)
  • output_async.mp4 - Asynchronous video generation (async_video_gen.py)
  • output_multipart.mp4 - Multipart example output (multipart_example.py)

Note: You can customize output filenames using the --output parameter in all scripts.