mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Files

T

Yongye Zhu 800604bf53 [MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 prefill + decode on Blackwell (#41778 )

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
(cherry picked from commit 0d2732dd91)

2026-05-14 00:59:51 -07:00

attention_benchmarks

[MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 prefill + decode on Blackwell (#41778 )

2026-05-14 00:59:51 -07:00

auto_tune

Allow markdownlint to run locally (#36398 )

2026-03-08 20:05:24 -07:00

cutlass_benchmarks

[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799 )

2026-03-23 16:03:29 -04:00

disagg_benchmarks

[Refactor] Remove dead or duplicate func utils or variables (#35318 )

2026-02-26 10:57:56 -05:00

fused_kernels

[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518 )

2026-04-03 01:47:04 +00:00

kernels

[Perf] FP8 FlashInfer Attn for ViT (#38065 )

2026-04-27 13:44:15 +08:00

multi_turn

[Benchmark] Add --trust-remote-code flag to multi-turn benchmark (#41661 )

2026-05-05 01:00:37 -07:00

overheads

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

__init__.py

[vLLM IR] Add IR op testing and benchmarking infrastructure (#40167 )

2026-04-21 00:23:03 +00:00

backend_request_func.py

[Refactor] Remove dead or duplicate func utils or variables (#35318 )

2026-02-26 10:57:56 -05:00

benchmark_batch_invariance.py

[Chore] Update more locations to use attention_config.backend (#31153 )

2025-12-22 19:19:50 -08:00

benchmark_block_pool.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Mypy] Better fixes for the mypy issues in vllm/config (#37902 )

2026-03-25 06:14:43 -07:00

benchmark_ngram_proposer.py

[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 )

2026-01-09 05:44:18 +00:00

benchmark_prefix_block_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_prefix_caching.py

[Mypy] Better fixes for the mypy issues in vllm/config (#37902 )

2026-03-25 06:14:43 -07:00

benchmark_prioritization.py

[Mypy] Better fixes for the mypy issues in vllm/config (#37902 )

2026-03-25 06:14:43 -07:00

benchmark_serving_structured_output.py

[Misc] Consistent case for vllm bench serve results (#30403 )

2025-12-10 09:44:02 -08:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_topk_topp.py

[Hardware] Replace memory related torch.cuda APIs (#37031 )

2026-03-16 10:24:48 +00:00

benchmark_utils.py

[Refactor] Remove dead or duplicate func utils or variables (#35318 )

2026-02-26 10:57:56 -05:00

README.md

[Docs] Update link to Benchmark CLI documentation (#33254 )

2026-02-06 16:00:59 +00:00

run_structured_output_benchmark.sh

[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514 )

2026-02-17 12:22:56 +00:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage