mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Files

T

wang.yuqi 77b13b9602 [Docs] Reorganize examples docs. (#41082 )

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

2026-05-07 23:23:44 -07:00

Examples

vLLM's examples are organized into the following categories:

basic/ – Minimal examples for offline inference and online serving.
generate/ – Text generation examples, including multimodal models.
pooling/ – Examples for embedding, classification, scoring, reward, etc.
speech_to_text/ – Speech transcription, translation and real-time audio examples.
features/ – Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
reasoning/ – Examples for reasoning with vLLM.
tool_calling/ – Examples for function/tool calling with vLLM.
applications/ – Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
rl/ – Reinforcement learning examples.
deployment/ – Examples for deploying vLLM in production.
ray_serving/ – Scalable serving using Ray.
disaggregated/ – Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
observability/ – Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).