mirror of
https://github.com/vllm-project/vllm.git
synced 2026-06-06 00:16:14 +00:00
[Feature] Add support for timed trace replay in vllm bench serve to replay Moonshot and Alibaba workload traces (#39795)
Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
This commit is contained in:
@@ -918,6 +918,41 @@ vllm bench serve \
|
||||
|
||||
</details>
|
||||
|
||||
### Replay Timed Traces
|
||||
|
||||
<details class="admonition abstract" markdown="1">
|
||||
<summary>Show more</summary>
|
||||
|
||||
Example of how to run traces which have timing information
|
||||
with them.
|
||||
|
||||
#### Running MoonshotAI traces
|
||||
|
||||
Start the server:
|
||||
|
||||
```bash
|
||||
vllm serve Qwen/Qwen3.5-2B \
|
||||
--host 127.0.0.1 --port 8000
|
||||
```
|
||||
|
||||
Run the benchmark:
|
||||
|
||||
```bash
|
||||
# Download an example trace
|
||||
# curl -L -o conversation_trace.jsonl \
|
||||
#https://raw.githubusercontent.com/kvcache-ai/Mooncake/main/FAST25-release/traces/conversation_trace.jsonl
|
||||
|
||||
vllm bench serve --model Qwen/Qwen3.5-2B \
|
||||
--dataset-name=timed_trace --num-prompts 100 --host 127.0.0.1 \
|
||||
--port 8000 --dataset-path ./conversation_trace.jsonl \
|
||||
--ignore-eos --self-timed --timed-trace-chunk-hash-size 512 \
|
||||
--timed-trace-sec-multiplier 0.001
|
||||
```
|
||||
|
||||
This will replay the first 100 lines from the trace file `conversation.jsonl`.
|
||||
|
||||
</details>
|
||||
|
||||
### 🧪 Hashing Benchmarks
|
||||
|
||||
<details class="admonition abstract" markdown="1">
|
||||
|
||||
Reference in New Issue
Block a user