TensorRT-LLMs/ctx_extra-llm-api-config.yaml at f6b0ddd61df561f7e855a325073716c5fc215d12 - TensorRT-LLMs - Gitea: Git with a cup of tea

kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Shi Xiaowei fe7dda834d

[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766 )

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

2025-08-13 17:39:27 +08:00

7 lines

245 B

YAML

Raw Blame History

 # The overlap scheduler for context servers is currently disabled, as it is
 # not yet supported in disaggregated context server architectures.
 disable_overlap_scheduler: True
 cache_transceiver_config:
   backend: UCX
   max_tokens_in_buffer: 2048