[Feature] Add support for timed trace replay in vllm bench serve to replay Moonshot and Alibaba workload traces (#39795)

Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
2026-06-06 00:16:14 +00:00 · 2026-05-28 12:31:34 +02:00
parent a9bc0ad8e4
commit bfb9ebc211
3 changed files with 349 additions and 51 deletions
@@ -918,6 +918,41 @@ vllm bench serve \

 </details>

+### Replay Timed Traces
+
+<details class="admonition abstract" markdown="1">
+<summary>Show more</summary>
+
+Example of how to run traces which have timing information
+with them.
+
+#### Running MoonshotAI traces
+
+Start the server:
+
+```bash
+vllm serve Qwen/Qwen3.5-2B \
+--host 127.0.0.1 --port 8000
+```
+
+Run the benchmark:
+
+```bash
+# Download an example trace 
+# curl -L -o conversation_trace.jsonl \
+#https://raw.githubusercontent.com/kvcache-ai/Mooncake/main/FAST25-release/traces/conversation_trace.jsonl 
+
+vllm bench serve --model Qwen/Qwen3.5-2B \  
+--dataset-name=timed_trace --num-prompts 100 --host 127.0.0.1 \
+--port 8000 --dataset-path ./conversation_trace.jsonl \
+--ignore-eos  --self-timed --timed-trace-chunk-hash-size 512 \
+--timed-trace-sec-multiplier 0.001 
+```
+
+This will replay the first 100 lines from the trace file `conversation.jsonl`.  
+
+</details>
+
 ### 🧪 Hashing Benchmarks

 <details class="admonition abstract" markdown="1">