Files
vllm/docs/examples/README.md
T
2026-05-07 23:23:44 -07:00

1.7 KiB
Raw Blame History

Examples

vLLM's examples are organized into the following categories:

  • basic/ Minimal examples for offline inference and online serving.
  • generate/ Text generation examples, including multimodal models.
  • pooling/ Examples for embedding, classification, scoring, reward, etc.
  • speech_to_text/ Speech transcription, translation and real-time audio examples.
  • features/ Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
  • reasoning/ Examples for reasoning with vLLM.
  • tool_calling/ Examples for function/tool calling with vLLM.
  • applications/ Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
  • rl/ Reinforcement learning examples.
  • deployment/ Examples for deploying vLLM in production.
  • ray_serving/ Scalable serving using Ray.
  • disaggregated/ Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
  • observability/ Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).