TensorRT-LLMs/examples/visual_gen/serve/README.md
Chang Liu 26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462)
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00

323 lines
10 KiB
Markdown

# Visual Generation API Examples
This directory contains example scripts that demonstrate how to use the TensorRT-LLM Visual Generation API endpoints for image and video generation.
## Overview
These examples show how to interact with the visual generation server using both the OpenAI Python SDK and standard HTTP requests. The API provides endpoints for:
- **Image Generation**: Text-to-image generation (T2I)
- **Video Generation**:
- Text-to-video generation (T2V) - generate videos from text prompts only
- Text+Image-to-video generation (TI2V) - generate videos from text + reference image
- Both synchronous and asynchronous modes supported
- Multipart/form-data support for file uploads
- **Video Management**: Retrieving and deleting generated videos
## Prerequisites
Before running these examples, ensure you have:
1. **Install modules**: Install required dependencies before running examples:
```bash
pip install git+https://github.com/huggingface/diffusers.git
pip install av
```
2. **Server Running**: The TensorRT-LLM visual generation server must be running
```bash
trtllm-serve <path to your model> --extra_visual_gen_options <path to config yaml>
```
e.g.
```bash
trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml
# Run server on background:
trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml > /tmp/serve.log 2>&1 &
## Check if the server is setup
tail -f /tmp/serve.log
```
## Examples
Current supported & tested models:
1. WAN T2V/I2V for video generation (t2v, ti2v, delete_video)
### 1. Synchronous Image Generation (`sync_t2i.py`)
Demonstrates synchronous text-to-image generation using the OpenAI SDK.
**Features:**
- Generates images from text prompts
- Supports configurable image size and quality
- Returns base64-encoded images or URLs
- Saves generated images to disk
**Usage:**
```bash
# Use default localhost server
python sync_image_gen.py
# Specify custom server URL
python sync_image_gen.py http://your-server:8000/v1
```
**API Endpoint:** `POST /v1/images/generations`
**Output:** Saves generated image to `output_generation.png` (or numbered files for multiple images)
---
### 2. Synchronous Video Generation with T2V and TI2V Modes (`sync_video_gen.py`)
Demonstrates synchronous video generation using direct HTTP requests. Waits for completion and returns the video file directly.
**Features:**
- **T2V Mode**: Generate videos from text prompts only
- **TI2V Mode**: Generate videos from text + reference image (multipart/form-data)
- Waits for video generation to complete before returning
- Returns video file directly in response
- Command-line interface for easy testing
**Usage:**
```bash
# Text-to-Video (T2V) - No reference image
python sync_video_gen.py --mode t2v \
--prompt "A cute cat playing with a ball in the park" \
--duration 4.0 --fps 24 --size 256x256
# Text+Image-to-Video (TI2V) - With reference image
## Note: longer duration and higher size will lead to much longer waiting time
python sync_video_gen.py --mode ti2v \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 4.0 --fps 24 --size 512x512
# Custom parameters
python sync_video_gen.py --mode t2v \
--prompt "A serene sunset over the ocean" \
--duration 5.0 --fps 30 --size 512x512 \
--output my_video.mp4
```
**Command-Line Arguments:**
- `--mode` - Generation mode: `t2v` or `ti2v` (default: t2v)
- `--prompt` - Text prompt for video generation (required)
- `--image` - Path to reference image (required for ti2v mode)
- `--base-url` - API server URL (default: http://localhost:8000/v1)
- `--model` - Model name (default: wan)
- `--duration` - Video duration in seconds (default: 4.0)
- `--fps` - Frames per second (default: 24)
- `--size` - Video resolution in WxH format (default: 256x256)
- `--output` - Output video file path (default: output_sync.mp4)
**API Endpoint:** `POST /v1/videos/generations`
**API Details:**
- T2V uses JSON `Content-Type: application/json`
- TI2V uses multipart/form-data `Content-Type: multipart/form-data` with file upload
**Output:** Saves generated video to specified output file
---
### 3. Async Video Generation with T2V and TI2V Modes (`async_video_gen.py`)
**NEW**: Enhanced async video generation supporting both Text-to-Video (T2V) and Text+Image-to-Video (TI2V) modes.
**Features:**
- **T2V Mode**: Generate videos from text prompts only (JSON request)
- **TI2V Mode**: Generate videos from text + reference image (multipart/form-data with file upload)
- Command-line interface for easy testing
- Automatic mode detection
- Comprehensive parameter control
**Usage:**
```bash
# Text-to-Video (T2V) - No reference image
python async_video_gen.py --mode t2v \
--prompt "A cool cat on a motorcycle in the night" \
--duration 4.0 --fps 24 --size 256x256
# Text+Image-to-Video (TI2V) - With reference image
python async_video_gen.py --mode ti2v \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 4.0 --fps 24 --size 512x512
# Custom parameters
python async_video_gen.py --mode t2v \
--prompt "A serene sunset over the ocean" \
--duration 5.0 --fps 30 --size 512x512 \
--output my_video.mp4
```
**Command-Line Arguments:**
- `--mode` - Generation mode: `t2v` or `ti2v` (default: t2v)
- `--prompt` - Text prompt for video generation (required)
- `--image` - Path to reference image (required for ti2v mode)
- `--base-url` - API server URL (default: http://localhost:8000/v1)
- `--model` - Model name (default: wan)
- `--duration` - Video duration in seconds (default: 4.0)
- `--fps` - Frames per second (default: 24)
- `--size` - Video resolution in WxH format (default: 256x256)
- `--output` - Output video file path (default: output_async.mp4)
**API Details:**
- T2V uses JSON `Content-Type: application/json`
- TI2V uses multipart/form-data `Content-Type: multipart/form-data` with file upload
**Output:** Saves generated video to specified output file
---
### 4. Video Deletion (`delete_video.py`)
Demonstrates the complete lifecycle of video generation and deletion.
**Features:**
- Creates a test video generation job
- Waits for completion
- Deletes the generated video
- Verifies deletion by attempting to retrieve the deleted video
- Tests error handling for non-existent videos
**Usage:**
```bash
# Use default localhost server
python delete_video.py
# Specify custom server URL
python delete_video.py http://your-server:8000/v1
```
**API Endpoints:**
- `POST /v1/videos` - Create video job
- `GET /v1/videos/{video_id}` - Check video status
- `DELETE /v1/videos/{video_id}` - Delete video
**Test Flow:**
1. Create video generation job
2. Wait for completion
3. Delete the video
4. Verify video returns `NotFoundError`
5. Test deletion of non-existent video
---
## API Configuration
All examples use the following default configuration:
- **Base URL**: `http://localhost:8000/v1`
- **API Key**: `"tensorrt_llm"` (authentication token)
- **Timeout**: 300 seconds for async operations
You can customize these by:
1. Passing the base URL as a command-line argument
2. Modifying the default parameters in each script's function
## Common Parameters
### Image Generation
- `model`: Model identifier (e.g., "wan")
- `prompt`: Text description
- `n`: Number of images to generate
- `size`: Image dimensions (e.g., "512x512", "1024x1024")
- `quality`: "standard" or "hd"
- `response_format`: "b64_json" or "url"
### Video Generation
- `model`: Model identifier (e.g., "wan")
- `prompt`: Text description
- `size`: Video resolution (e.g., "256x256", "512x512")
- `seconds`: Duration in seconds
- `fps`: Frames per second
- `input_reference`: Reference image file (for TI2V mode)
## Quick Reference - curl Examples
### Text-to-Video (JSON)
```bash
curl -X POST "http://localhost:8000/v1/videos" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A cool cat on a motorcycle",
"seconds": 4.0,
"fps": 24,
"size": "256x256"
}'
```
### Text+Image-to-Video (Multipart with File Upload)
```bash
curl -X POST "http://localhost:8000/v1/videos" \
-F "prompt=She turns around and smiles" \
-F "input_reference=@./media/woman_skyline_original_720p.jpeg" \
-F "seconds=4.0" \
-F "fps=24" \
-F "size=256x256" \
-F "guidance_scale=5.0"
```
### Check Video Status
```bash
curl -X GET "http://localhost:8000/v1/videos/{video_id}"
```
### Download Video
```bash
curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -o output.mp4
```
### Delete Video
```bash
curl -X DELETE "http://localhost:8000/v1/videos/{video_id}"
```
## API Endpoints Summary
| Endpoint | Method | Mode | Content-Type | Purpose |
|----------|--------|------|--------------|---------|
| `/v1/videos` | POST | Async | JSON or Multipart | Create video job (T2V/TI2V) |
| `/v1/videos/generations` | POST | Sync | JSON or Multipart | Generate video sync (T2V/TI2V) |
| `/v1/videos/{id}` | GET | - | - | Get video status/metadata |
| `/v1/videos/{id}/content` | GET | - | - | Download video file |
| `/v1/videos/{id}` | DELETE | - | - | Delete video |
| `/v1/videos` | GET | - | - | List all videos |
| `/v1/images/generations` | POST | - | JSON | Generate images (T2I) |
**Note:** Both `/v1/videos` (async) and `/v1/videos/generations` (sync) support:
- **JSON**: Standard text-to-video (T2V)
- **Multipart/Form-Data**: Text+image-to-video (TI2V) with file upload
## Error Handling
All examples include comprehensive error handling:
- Connection errors (server not running)
- API errors (invalid parameters, model not found)
- Timeout errors (generation taking too long)
- Resource errors (video not found for deletion)
Errors are displayed with full stack traces for debugging.
## Output Files
Generated files are saved to the current working directory:
- `output_generation.png` - Synchronous image generation (`sync_image_gen.py`)
- `output_sync.mp4` - Synchronous video generation (`sync_video_gen.py`)
- `output_async.mp4` - Asynchronous video generation (`async_video_gen.py`)
- `output_multipart.mp4` - Multipart example output (`multipart_example.py`)
**Note:** You can customize output filenames using the `--output` parameter in all scripts.