TensorRT-LLMs/examples/auto_deploy/model_registry/configs/llama4_scout.yaml
tcherckez-nvidia 9f6abaf59f
[#9640][feat] Migrate model registry to v2.0 format with composable configs (#9836)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-19 05:30:02 -08:00

11 lines
298 B
YAML

# Configuration for Llama 4 Scout (VLM)
# AutoDeploy-specific settings for Llama 4 Scout MoE vision model
max_batch_size: 1024
max_num_tokens: 2048
free_mem_ratio: 0.9
trust_remote_code: true
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 768, 1024]
kv_cache_config:
dtype: fp8