mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

dominicshanshan 6345074686

[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 )

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>

2025-11-29 21:48:48 +08:00

7.9 KiB

Raw Blame History

(support-matrix)=

Supported Models

The following is a table of supported models for the PyTorch backend:

Architecture	Model	HuggingFace Example
`BertForSequenceClassification`	BERT-based	`textattack/bert-base-uncased-yelp-polarity`
`DeciLMForCausalLM`	Nemotron	`nvidia/Llama-3_1-Nemotron-51B-Instruct`
`DeepseekV3ForCausalLM`	DeepSeek-V3	`deepseek-ai/DeepSeek-V3`
`Exaone4ForCausalLM`	EXAONE 4.0	`LGAI-EXAONE/EXAONE-4.0-32B`
`Gemma3ForCausalLM`	Gemma 3	`google/gemma-3-1b-it`
`GptOssForCausalLM`	GPT-OSS	`openai/gpt-oss-120b`
`LlamaForCausalLM`	Llama 3.1, Llama 3, Llama 2, LLaMA	`meta-llama/Meta-Llama-3.1-70B`
`Llama4ForConditionalGeneration`	Llama 4	`meta-llama/Llama-4-Scout-17B-16E-Instruct`
`MistralForCausalLM`	Mistral	`mistralai/Mistral-7B-v0.1`
`MixtralForCausalLM`	Mixtral	`mistralai/Mixtral-8x7B-v0.1`
`MllamaForConditionalGeneration`	Llama 3.2	`meta-llama/Llama-3.2-11B-Vision`
`NemotronForCausalLM`	Nemotron-3, Nemotron-4, Minitron	`nvidia/Minitron-8B-Base`
`NemotronNASForCausalLM`	NemotronNAS	`nvidia/Llama-3_3-Nemotron-Super-49B-v1`
`Phi3ForCausalLM`	Phi-4	`microsoft/Phi-4`
`Qwen2ForCausalLM`	QwQ, Qwen2	`Qwen/Qwen2-7B-Instruct`
`Qwen2ForProcessRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-PRM-7B`
`Qwen2ForRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-RM-72B`
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-8B`
`Qwen3MoeForCausalLM`	Qwen3MoE	`Qwen/Qwen3-30B-A3B`
`Qwen3NextForCausalLM`	Qwen3Next	`Qwen/Qwen3-Next-80B-A3B-Thinking`

Model-Feature Support Matrix(Key Models)

Note: Support for other models may vary. Features marked "N/A" are not applicable to the model architecture.

Model Architecture/Feature	Overlap Scheduler	CUDA Graph	Attention Data Parallelism	Disaggregated Serving	Chunked Prefill	MTP	EAGLE-3(One Model Engine)	EAGLE-3(Two Model Engine)	Torch Sampler	TLLM C++ Sampler	KV Cache Reuse	Sliding Window Attention	Logits Post Processor	Guided Decoding
`DeepseekV3ForCausalLM`	Yes	Yes	Yes	Yes	Yes ¹	Yes	No	No	Yes	Yes	Yes ²	N/A	Yes	Yes
`Qwen3MoeForCausalLM`	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes	Yes	N/A	Yes	Yes
`Qwen3NextForCausalLM`	Yes	Yes	No	Untested	Yes	No	No	No	Yes	Yes	No	No	Untested	Untested
`Llama4ForConditionalGeneration`	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes	Untested	N/A	Yes	Yes
`GptOssForCausalLM`	Yes	Yes	Yes	Yes	No	No	Yes	No	Yes	Yes	No	N/A	Yes	Yes

Multimodal Feature Support Matrix (PyTorch Backend)

Model Architecture/Feature	Overlap Scheduler	CUDA Graph	Chunked Prefill	Torch Sampler	TLLM C++ Sampler	KV Cache Reuse	Logits Post Processor	EPD Disaggregated Serving	Modality
`Gemma3ForConditionalGeneration`	Yes	Yes	N/A	Yes	Yes	N/A	Yes	No	L + I
`HCXVisionForCausalLM`	Yes	Yes	No	Yes	Yes	Yes	Yes	No	L + I
`LlavaLlamaModel (VILA)`	Yes	Yes	No	Yes	Yes	No	Yes	No	L + I + V
`LlavaNextForConditionalGeneration`	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	L + I
`Llama4ForConditionalGeneration`	Yes	Yes	No	Yes	Yes	No	Yes	No	L + I
`Mistral3ForConditionalGeneration`	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	L + I
`NemotronH_Nano_VL_V2`	Yes	Yes	Yes	Yes	Yes	N/A	Yes	No	L + I + V
`Phi4MMForCausalLM`	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	L + I + A
`Qwen2VLForConditionalGeneration`	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	L + I + V
`Qwen2_5_VLForConditionalGeneration`	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	L + I + V

Note:

L: Language
I: Image
V: Video
A: Audio

Chunked Prefill for MLA can only be enabled on SM100. ↩︎
KV cache reuse for MLA can only be enabled on SM90/SM100 and in BF16/FP8 KV cache dtype. ↩︎

7.9 KiB Raw Blame History

Supported Models

Model-Feature Support Matrix(Key Models)

Multimodal Feature Support Matrix (PyTorch Backend)

7.9 KiB

Raw Blame History