TensorRT-LLMs/qwen3.yaml at 9ec6a6b68fdfee7ac9304e1ee8ebac2eefa6737f - TensorRT-LLMs - Gitea: Git with a cup of tea

kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-07 11:41:47 +08:00

Anish Shanbhag 6a6317727b

[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173 )

Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>

2025-11-03 17:42:41 -08:00

21 lines

305 B

YAML

Raw Blame History

 max_batch_size: 161
 max_num_tokens: 1160
 kv_cache_free_gpu_memory_fraction: 0.8
 tensor_parallel_size: 1
 moe_expert_parallel_size: 1
 cuda_graph_config:
   enable_padding: true
   batch_sizes:
   - 1
   - 2
   - 4
   - 8
   - 16
   - 32
   - 64
   - 128
   - 256
   - 384
 print_iter_log: true
 enable_attention_dp: true