TensorRT-LLMs/qwen3.yaml at 2d45b482e084e1efa179e1a554bf263eec7952e2 - TensorRT-LLMs - Gitea: Git with a cup of tea

kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Venky fd1270b9ab

[TRTC-43] [feat] Add config db and docs (#9420 )

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

2025-12-12 04:00:03 +08:00

21 lines

305 B

YAML

Raw Blame History

 max_batch_size: 161
 max_num_tokens: 1160
 kv_cache_free_gpu_memory_fraction: 0.8
 tensor_parallel_size: 1
 moe_expert_parallel_size: 1
 cuda_graph_config:
   enable_padding: true
   batch_sizes:
   - 1
   - 2
   - 4
   - 8
   - 16
   - 32
   - 64
   - 128
   - 256
   - 384
 print_iter_log: true
 enable_attention_dp: true