[Docs] Reorganize examples docs. (#41082)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
This commit is contained in:
wang.yuqi
2026-05-08 14:23:44 +08:00
committed by GitHub
parent ed582b6a4c
commit 77b13b9602
+14 -4
View File
@@ -1,7 +1,17 @@
# Examples
vLLM's examples are split into three categories:
vLLM's examples are organized into the following categories:
- If you are using vLLM from within Python code, see the [Offline Inference](.) section.
- If you are using vLLM from an HTTP application or client, see the [Online Serving](.) section.
- For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see the [Others](.) section.
- **[`basic/`](../../examples/basic)** Minimal examples for offline inference and online serving.
- **[`generate/`](../../examples/generate)** Text generation examples, including multimodal models.
- **[`pooling/`](../../examples/pooling)** Examples for embedding, classification, scoring, reward, etc.
- **[`speech_to_text/`](../../examples/speech_to_text)** Speech transcription, translation and real-time audio examples.
- **[`features/`](../../examples/features)** Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
- **[`reasoning/`](../../examples/reasoning)** Examples for reasoning with vLLM.
- **[`tool_calling/`](../../examples/tool_calling)** Examples for function/tool calling with vLLM.
- **[`applications/`](../../examples/applications)** Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
- **[`rl/`](../../examples/rl)** Reinforcement learning examples.
- **[`deployment/`](../../examples/deployment)** Examples for deploying vLLM in production.
- **[`ray_serving/`](../../examples/ray_serving)** Scalable serving using Ray.
- **[`disaggregated/`](../../examples/disaggregated)** Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
- **[`observability/`](../../examples/observability)** Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).