TensorRT-LLMs/examples/disaggregated/disagg_config.yaml
Chuang Zhu 44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00

25 lines
559 B
YAML

hostname: localhost
port: 8000
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
free_gpu_memory_fraction: 0.25
backend: "pytorch"
disable_overlap_scheduler: True
context_servers:
num_instances: 1
tensor_parallel_size: 1
pipeline_parallel_size: 1
kv_cache_config:
free_gpu_memory_fraction: 0.2
cache_transceiver_config:
backend: "default"
urls:
- "localhost:8001"
generation_servers:
num_instances: 1
tensor_parallel_size: 1
pipeline_parallel_size: 1
cache_transceiver_config:
backend: "default"
urls:
- "localhost:8002"