From f191d5630e371a575bee2ab4eedc9332dcfd43f9 Mon Sep 17 00:00:00 2001
From: Chunyang Wen <chunyang.wen@gmail.com>
Date: Fri, 29 May 2026 22:40:05 +0800
Subject: [PATCH] docs: clarify ITL acronym in optimization docs (#43922)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
---
 docs/configuration/optimization.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/configuration/optimization.md b/docs/configuration/optimization.md
index eb6bdce37b9..5bf789a0919 100644
--- a/docs/configuration/optimization.md
+++ b/docs/configuration/optimization.md
@@ -46,14 +46,14 @@ In V1, **chunked prefill is enabled by default whenever possible**. With chunked
 
 This policy has two benefits:
 
-- It improves ITL and generation decode because decode requests are prioritized.
+- It improves inter-token latency (ITL) and generation decode because decode requests are prioritized.
 - It helps achieve better GPU utilization by locating compute-bound (prefill) and memory-bound (decode) requests to the same batch.
 
 ### Performance Tuning with Chunked Prefill
 
 You can tune the performance by adjusting `max_num_batched_tokens`:
 
-- Smaller values (e.g., 2048) achieve better inter-token latency (ITL) because there are fewer prefills slowing down decodes.
+- Smaller values (e.g., 2048) achieve better ITL because there are fewer prefills slowing down decodes.
 - Higher values achieve better time to first token (TTFT) as you can process more prefill tokens in a batch.
 - For optimal throughput, we recommend setting `max_num_batched_tokens > 8192` especially for smaller models on large GPUs.
 - If `max_num_batched_tokens` is the same as `max_model_len`, that's almost the equivalent to the V0 default scheduling policy (except that it still prioritizes decodes).