TensorRT-LLMs/docs/source/models/supported-models.md
dominicshanshan 6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00

7.9 KiB

(support-matrix)=

Supported Models

The following is a table of supported models for the PyTorch backend:

Architecture Model HuggingFace Example
BertForSequenceClassification BERT-based textattack/bert-base-uncased-yelp-polarity
DeciLMForCausalLM Nemotron nvidia/Llama-3_1-Nemotron-51B-Instruct
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3
Exaone4ForCausalLM EXAONE 4.0 LGAI-EXAONE/EXAONE-4.0-32B
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it
GptOssForCausalLM GPT-OSS openai/gpt-oss-120b
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA meta-llama/Meta-Llama-3.1-70B
Llama4ForConditionalGeneration Llama 4 meta-llama/Llama-4-Scout-17B-16E-Instruct
MistralForCausalLM Mistral mistralai/Mistral-7B-v0.1
MixtralForCausalLM Mixtral mistralai/Mixtral-8x7B-v0.1
MllamaForConditionalGeneration Llama 3.2 meta-llama/Llama-3.2-11B-Vision
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base
NemotronNASForCausalLM NemotronNAS nvidia/Llama-3_3-Nemotron-Super-49B-v1
Phi3ForCausalLM Phi-4 microsoft/Phi-4
Qwen2ForCausalLM QwQ, Qwen2 Qwen/Qwen2-7B-Instruct
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-30B-A3B
Qwen3NextForCausalLM Qwen3Next Qwen/Qwen3-Next-80B-A3B-Thinking

Model-Feature Support Matrix(Key Models)

Note: Support for other models may vary. Features marked "N/A" are not applicable to the model architecture.

Model Architecture/Feature Overlap Scheduler CUDA Graph Attention Data Parallelism Disaggregated Serving Chunked Prefill MTP EAGLE-3(One Model Engine) EAGLE-3(Two Model Engine) Torch Sampler TLLM C++ Sampler KV Cache Reuse Sliding Window Attention Logits Post Processor Guided Decoding
DeepseekV3ForCausalLM Yes Yes Yes Yes Yes 1 Yes No No Yes Yes Yes 2 N/A Yes Yes
Qwen3MoeForCausalLM Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes N/A Yes Yes
Qwen3NextForCausalLM Yes Yes No Untested Yes No No No Yes Yes No No Untested Untested
Llama4ForConditionalGeneration Yes Yes Yes Yes Yes No Yes Yes Yes Yes Untested N/A Yes Yes
GptOssForCausalLM Yes Yes Yes Yes No No Yes No Yes Yes No N/A Yes Yes

Multimodal Feature Support Matrix (PyTorch Backend)

Model Architecture/Feature Overlap Scheduler CUDA Graph Chunked Prefill Torch Sampler TLLM C++ Sampler KV Cache Reuse Logits Post Processor EPD Disaggregated Serving Modality
Gemma3ForConditionalGeneration Yes Yes N/A Yes Yes N/A Yes No L + I
HCXVisionForCausalLM Yes Yes No Yes Yes Yes Yes No L + I
LlavaLlamaModel (VILA) Yes Yes No Yes Yes No Yes No L + I + V
LlavaNextForConditionalGeneration Yes Yes Yes Yes Yes Yes Yes Yes L + I
Llama4ForConditionalGeneration Yes Yes No Yes Yes No Yes No L + I
Mistral3ForConditionalGeneration Yes Yes Yes Yes Yes Yes Yes No L + I
NemotronH_Nano_VL_V2 Yes Yes Yes Yes Yes N/A Yes No L + I + V
Phi4MMForCausalLM Yes Yes Yes Yes Yes Yes Yes No L + I + A
Qwen2VLForConditionalGeneration Yes Yes Yes Yes Yes Yes Yes No L + I + V
Qwen2_5_VLForConditionalGeneration Yes Yes Yes Yes Yes Yes Yes No L + I + V

Note:

  • L: Language
  • I: Image
  • V: Video
  • A: Audio

  1. Chunked Prefill for MLA can only be enabled on SM100. ↩︎

  2. KV cache reuse for MLA can only be enabled on SM90/SM100 and in BF16/FP8 KV cache dtype. ↩︎