# Phi-4-MultiModal model This document outlines the procedures for executing Phi-4-Multimodal (phi4-mm) using TensorRT LLM. The implementation supports both single and multi-GPU configurations via the PyTorch backend. Additionally, ModelOpt was employed to derive FP8 and NVFP4 checkpoints from the source [BF16 repository](https://huggingface.co/microsoft/Phi-4-multimodal-instruct). ## Overview * Supported BF16, FP8, NVFP4 model formats. * Supported single and multi-GPUs inference. * Added support for KV cache reuse and chunked prefill for phi4-mm models * Enabled LoRA support for multi-modal inputs. * Configurable RoPE scaling: The model defaults to Long RoPE but automatically switches to Short RoPE when `--max_seq_len` is set to 4096 or lower. ## Usage ### Offline batch inference ``` python examples/llm-api/quickstart_multimodal.py --model_dir --modality image --load_lora --auto_model_name Phi4MMForCausalLM python examples/llm-api/quickstart_multimodal.py --model_dir --modality audio --load_lora --auto_model_name Phi4MMForCausalLM python examples/llm-api/quickstart_multimodal.py --model_dir --modality image_audio --load_lora --auto_model_name Phi4MMForCausalLM ``` ### TRTLLM-serve ``` cat > lora_llmapi_config.yml< \ --backend pytorch \ --trust_remote_code \ --config lora_llmapi_config.yml ``` ``` curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Phi-4-multimodal-instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe the natural environment in the image." }, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png" } } ] } ], "lora_request": { "lora_name": "lora1", "lora_int_id": 0, "lora_path": "" }, "max_tokens": 64, "temperature": 0 }' | jq ``` ## Notes * Model Download: Please use `git clone git@hf.co:microsoft/Phi-4-multimodal-instruct` to download the model. Do not use snapshot downloads, as they cause runtime errors due to specific directory structure requirements (see `tensorrt_llm/_torch/models/modeling_phi4mm.py`). * Transformers Compatibility: The Phi-4-MM model is currently incompatible with the latest transformers library, despite the presence of related code. If you need to use the transformers implementation, please refer to [this discussion](https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/70).