mirror of
https://github.com/vllm-project/vllm.git
synced 2026-06-06 00:16:14 +00:00
[Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#44435)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
This commit is contained in:
@@ -17,10 +17,7 @@ Batch invariance is crucial for several use cases:
|
||||
|
||||
## Hardware Requirements
|
||||
|
||||
Batch invariance currently requires NVIDIA GPUs with compute capability 9.0 or higher:
|
||||
|
||||
- **H-series**: H100, H200
|
||||
- **B-series**: B100, B200
|
||||
Batch invariance requires NVIDIA GPUs with compute capability 8.0 or higher.
|
||||
|
||||
## Enabling Batch Invariance
|
||||
|
||||
@@ -107,7 +104,7 @@ Batch invariance has been tested and verified on the following models:
|
||||
- **Qwen3 (Dense)**: `Qwen/Qwen3-1.7B`, `Qwen/Qwen3-8B`, `Qwen/Qwen3-4B-AWQ`, `Qwen/Qwen3-8B-AWQ`
|
||||
- **Qwen3 (MoE)**: `Qwen/Qwen3-30B-A3B`, `Qwen/Qwen3-Next-80B-A3B-Instruct`, `Qwen/Qwen3-30B-A3B-Thinking-2507-FP8`
|
||||
- **Qwen2.5**: `Qwen/Qwen2.5-0.5B-Instruct`, `Qwen/Qwen2.5-1.5B-Instruct`, `Qwen/Qwen2.5-3B-Instruct`, `Qwen/Qwen2.5-7B-Instruct`, `Qwen/Qwen2.5-14B-Instruct`, `Qwen/Qwen2.5-32B-Instruct`
|
||||
- **Llama 3**: `meta-llama/Llama-3.1-8B-Instruct`, `meta-llama/Llama-3.2-1B-Instruct`
|
||||
- **Llama 3**: Llama3.1 and 3.2 series, `meta-llama/Llama-3.2-3B-Instruct` for example
|
||||
- **GPT-OSS**: `openai/gpt-oss-20b`, `openai/gpt-oss-120b`
|
||||
- **Mistral**: `mistralai/Mistral-7B-v0.3`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user