TensorRT-LLMs/llama4_scout.yaml at bf16fbd86ccb97d1c19cc85990fd74e678d76a4e - TensorRT-LLMs - Gitea: Git with a cup of tea

kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-18 16:55:08 +08:00

tcherckez-nvidia 9f6abaf59f

[#9640 ][feat] Migrate model registry to v2.0 format with composable configs (#9836 )

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>

2025-12-19 05:30:02 -08:00

11 lines

298 B

YAML

Raw Blame History

 # Configuration for Llama 4 Scout (VLM)
 # AutoDeploy-specific settings for Llama 4 Scout MoE vision model
 max_batch_size: 1024
 max_num_tokens: 2048
 free_mem_ratio: 0.9
 trust_remote_code: true
 cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 768, 1024]
 kv_cache_config:
   dtype: fp8