TensorRT-LLMs/examples/configs/deepseek-r1-throughput.yaml
Anish Shanbhag 6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00

17 lines
368 B
YAML

max_batch_size: 1024
max_num_tokens: 3200
kv_cache_free_gpu_memory_fraction: 0.8
tensor_parallel_size: 8
moe_expert_parallel_size: 8
trust_remote_code: true
enable_attention_dp: true
cuda_graph_config:
enable_padding: true
max_batch_size: 128
kv_cache_config:
dtype: fp8
stream_interval: 10
speculative_config:
decoding_type: MTP
num_nextn_predict_layers: 1