mirror of
https://github.com/vllm-project/vllm.git
synced 2026-06-06 00:16:14 +00:00
77b13b9602
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
1.7 KiB
1.7 KiB
Examples
vLLM's examples are organized into the following categories:
basic/– Minimal examples for offline inference and online serving.generate/– Text generation examples, including multimodal models.pooling/– Examples for embedding, classification, scoring, reward, etc.speech_to_text/– Speech transcription, translation and real-time audio examples.features/– Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.reasoning/– Examples for reasoning with vLLM.tool_calling/– Examples for function/tool calling with vLLM.applications/– Application examples such as chatbots and RAG (Retrieval-Augmented Generation).rl/– Reinforcement learning examples.deployment/– Examples for deploying vLLM in production.ray_serving/– Scalable serving using Ray.disaggregated/– Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.observability/– Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).