[Feature] Add support for timed trace replay in vllm bench serve to replay Moonshot and Alibaba workload traces (#39795)

Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
This commit is contained in:
Animesh Trivedi
2026-05-28 12:31:34 +02:00
committed by GitHub
parent a9bc0ad8e4
commit bfb9ebc211
3 changed files with 349 additions and 51 deletions
+35
View File
@@ -918,6 +918,41 @@ vllm bench serve \
</details>
### Replay Timed Traces
<details class="admonition abstract" markdown="1">
<summary>Show more</summary>
Example of how to run traces which have timing information
with them.
#### Running MoonshotAI traces
Start the server:
```bash
vllm serve Qwen/Qwen3.5-2B \
--host 127.0.0.1 --port 8000
```
Run the benchmark:
```bash
# Download an example trace
# curl -L -o conversation_trace.jsonl \
#https://raw.githubusercontent.com/kvcache-ai/Mooncake/main/FAST25-release/traces/conversation_trace.jsonl
vllm bench serve --model Qwen/Qwen3.5-2B \
--dataset-name=timed_trace --num-prompts 100 --host 127.0.0.1 \
--port 8000 --dataset-path ./conversation_trace.jsonl \
--ignore-eos --self-timed --timed-trace-chunk-hash-size 512 \
--timed-trace-sec-multiplier 0.001
```
This will replay the first 100 lines from the trace file `conversation.jsonl`.
</details>
### 🧪 Hashing Benchmarks
<details class="admonition abstract" markdown="1">