# Visual Generation API Examples This directory contains example scripts that demonstrate how to use the TensorRT-LLM Visual Generation API endpoints for image and video generation. ## Overview These examples show how to interact with the visual generation server using both the OpenAI Python SDK and standard HTTP requests. The API provides endpoints for: - **Image Generation**: Text-to-image generation (T2I) - **Video Generation**: - Text-to-video generation (T2V) - generate videos from text prompts only - Text+Image-to-video generation (TI2V) - generate videos from text + reference image - Both synchronous and asynchronous modes supported - Multipart/form-data support for file uploads - **Video Management**: Retrieving and deleting generated videos ## Prerequisites Before running these examples, ensure you have: 1. **Install modules**: Install required dependencies before running examples: ```bash pip install git+https://github.com/huggingface/diffusers.git pip install av ``` 2. **Server Running**: The TensorRT-LLM visual generation server must be running ```bash trtllm-serve --extra_visual_gen_options ``` e.g. ```bash trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml # Run server on background: trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan.yml > /tmp/serve.log 2>&1 & ## Check if the server is setup tail -f /tmp/serve.log ``` ## Examples Current supported & tested models: 1. WAN T2V/I2V for video generation (t2v, ti2v, delete_video) ### 1. Synchronous Image Generation (`sync_t2i.py`) Demonstrates synchronous text-to-image generation using the OpenAI SDK. **Features:** - Generates images from text prompts - Supports configurable image size and quality - Returns base64-encoded images or URLs - Saves generated images to disk **Usage:** ```bash # Use default localhost server python sync_image_gen.py # Specify custom server URL python sync_image_gen.py http://your-server:8000/v1 ``` **API Endpoint:** `POST /v1/images/generations` **Output:** Saves generated image to `output_generation.png` (or numbered files for multiple images) --- ### 2. Synchronous Video Generation with T2V and TI2V Modes (`sync_video_gen.py`) Demonstrates synchronous video generation using direct HTTP requests. Waits for completion and returns the video file directly. **Features:** - **T2V Mode**: Generate videos from text prompts only - **TI2V Mode**: Generate videos from text + reference image (multipart/form-data) - Waits for video generation to complete before returning - Returns video file directly in response - Command-line interface for easy testing **Usage:** ```bash # Text-to-Video (T2V) - No reference image python sync_video_gen.py --mode t2v \ --prompt "A cute cat playing with a ball in the park" \ --duration 4.0 --fps 24 --size 256x256 # Text+Image-to-Video (TI2V) - With reference image ## Note: longer duration and higher size will lead to much longer waiting time python sync_video_gen.py --mode ti2v \ --prompt "She turns around and smiles, then slowly walks out of the frame" \ --image ./media/woman_skyline_original_720p.jpeg \ --duration 4.0 --fps 24 --size 512x512 # Custom parameters python sync_video_gen.py --mode t2v \ --prompt "A serene sunset over the ocean" \ --duration 5.0 --fps 30 --size 512x512 \ --output my_video.mp4 ``` **Command-Line Arguments:** - `--mode` - Generation mode: `t2v` or `ti2v` (default: t2v) - `--prompt` - Text prompt for video generation (required) - `--image` - Path to reference image (required for ti2v mode) - `--base-url` - API server URL (default: http://localhost:8000/v1) - `--model` - Model name (default: wan) - `--duration` - Video duration in seconds (default: 4.0) - `--fps` - Frames per second (default: 24) - `--size` - Video resolution in WxH format (default: 256x256) - `--output` - Output video file path (default: output_sync.mp4) **API Endpoint:** `POST /v1/videos/generations` **API Details:** - T2V uses JSON `Content-Type: application/json` - TI2V uses multipart/form-data `Content-Type: multipart/form-data` with file upload **Output:** Saves generated video to specified output file --- ### 3. Async Video Generation with T2V and TI2V Modes (`async_video_gen.py`) **NEW**: Enhanced async video generation supporting both Text-to-Video (T2V) and Text+Image-to-Video (TI2V) modes. **Features:** - **T2V Mode**: Generate videos from text prompts only (JSON request) - **TI2V Mode**: Generate videos from text + reference image (multipart/form-data with file upload) - Command-line interface for easy testing - Automatic mode detection - Comprehensive parameter control **Usage:** ```bash # Text-to-Video (T2V) - No reference image python async_video_gen.py --mode t2v \ --prompt "A cool cat on a motorcycle in the night" \ --duration 4.0 --fps 24 --size 256x256 # Text+Image-to-Video (TI2V) - With reference image python async_video_gen.py --mode ti2v \ --prompt "She turns around and smiles, then slowly walks out of the frame" \ --image ./media/woman_skyline_original_720p.jpeg \ --duration 4.0 --fps 24 --size 512x512 # Custom parameters python async_video_gen.py --mode t2v \ --prompt "A serene sunset over the ocean" \ --duration 5.0 --fps 30 --size 512x512 \ --output my_video.mp4 ``` **Command-Line Arguments:** - `--mode` - Generation mode: `t2v` or `ti2v` (default: t2v) - `--prompt` - Text prompt for video generation (required) - `--image` - Path to reference image (required for ti2v mode) - `--base-url` - API server URL (default: http://localhost:8000/v1) - `--model` - Model name (default: wan) - `--duration` - Video duration in seconds (default: 4.0) - `--fps` - Frames per second (default: 24) - `--size` - Video resolution in WxH format (default: 256x256) - `--output` - Output video file path (default: output_async.mp4) **API Details:** - T2V uses JSON `Content-Type: application/json` - TI2V uses multipart/form-data `Content-Type: multipart/form-data` with file upload **Output:** Saves generated video to specified output file --- ### 4. Video Deletion (`delete_video.py`) Demonstrates the complete lifecycle of video generation and deletion. **Features:** - Creates a test video generation job - Waits for completion - Deletes the generated video - Verifies deletion by attempting to retrieve the deleted video - Tests error handling for non-existent videos **Usage:** ```bash # Use default localhost server python delete_video.py # Specify custom server URL python delete_video.py http://your-server:8000/v1 ``` **API Endpoints:** - `POST /v1/videos` - Create video job - `GET /v1/videos/{video_id}` - Check video status - `DELETE /v1/videos/{video_id}` - Delete video **Test Flow:** 1. Create video generation job 2. Wait for completion 3. Delete the video 4. Verify video returns `NotFoundError` 5. Test deletion of non-existent video --- ## API Configuration All examples use the following default configuration: - **Base URL**: `http://localhost:8000/v1` - **API Key**: `"tensorrt_llm"` (authentication token) - **Timeout**: 300 seconds for async operations You can customize these by: 1. Passing the base URL as a command-line argument 2. Modifying the default parameters in each script's function ## Common Parameters ### Image Generation - `model`: Model identifier (e.g., "wan") - `prompt`: Text description - `n`: Number of images to generate - `size`: Image dimensions (e.g., "512x512", "1024x1024") - `quality`: "standard" or "hd" - `response_format`: "b64_json" or "url" ### Video Generation - `model`: Model identifier (e.g., "wan") - `prompt`: Text description - `size`: Video resolution (e.g., "256x256", "512x512") - `seconds`: Duration in seconds - `fps`: Frames per second - `input_reference`: Reference image file (for TI2V mode) ## Quick Reference - curl Examples ### Text-to-Video (JSON) ```bash curl -X POST "http://localhost:8000/v1/videos" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A cool cat on a motorcycle", "seconds": 4.0, "fps": 24, "size": "256x256" }' ``` ### Text+Image-to-Video (Multipart with File Upload) ```bash curl -X POST "http://localhost:8000/v1/videos" \ -F "prompt=She turns around and smiles" \ -F "input_reference=@./media/woman_skyline_original_720p.jpeg" \ -F "seconds=4.0" \ -F "fps=24" \ -F "size=256x256" \ -F "guidance_scale=5.0" ``` ### Check Video Status ```bash curl -X GET "http://localhost:8000/v1/videos/{video_id}" ``` ### Download Video ```bash curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -o output.mp4 ``` ### Delete Video ```bash curl -X DELETE "http://localhost:8000/v1/videos/{video_id}" ``` ## API Endpoints Summary | Endpoint | Method | Mode | Content-Type | Purpose | |----------|--------|------|--------------|---------| | `/v1/videos` | POST | Async | JSON or Multipart | Create video job (T2V/TI2V) | | `/v1/videos/generations` | POST | Sync | JSON or Multipart | Generate video sync (T2V/TI2V) | | `/v1/videos/{id}` | GET | - | - | Get video status/metadata | | `/v1/videos/{id}/content` | GET | - | - | Download video file | | `/v1/videos/{id}` | DELETE | - | - | Delete video | | `/v1/videos` | GET | - | - | List all videos | | `/v1/images/generations` | POST | - | JSON | Generate images (T2I) | **Note:** Both `/v1/videos` (async) and `/v1/videos/generations` (sync) support: - **JSON**: Standard text-to-video (T2V) - **Multipart/Form-Data**: Text+image-to-video (TI2V) with file upload ## Error Handling All examples include comprehensive error handling: - Connection errors (server not running) - API errors (invalid parameters, model not found) - Timeout errors (generation taking too long) - Resource errors (video not found for deletion) Errors are displayed with full stack traces for debugging. ## Output Files Generated files are saved to the current working directory: - `output_generation.png` - Synchronous image generation (`sync_image_gen.py`) - `output_sync.mp4` - Synchronous video generation (`sync_video_gen.py`) - `output_async.mp4` - Asynchronous video generation (`async_video_gen.py`) - `output_multipart.mp4` - Multipart example output (`multipart_example.py`) **Note:** You can customize output filenames using the `--output` parameter in all scripts.