[Docs] Reorganize examples docs. (#41082)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-06 00:16:14 +00:00 · 2026-05-08 14:23:44 +08:00
parent ed582b6a4c
commit 77b13b9602
1 changed files with 14 additions and 4 deletions
@@ -1,7 +1,17 @@
 # Examples

-vLLM's examples are split into three categories:
+vLLM's examples are organized into the following categories:

- If you are using vLLM from within Python code, see the [Offline Inference](.) section.
- If you are using vLLM from an HTTP application or client, see the [Online Serving](.) section.
- For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see the [Others](.) section.
+- **[`basic/`](../../examples/basic)** – Minimal examples for offline inference and online serving.
+- **[`generate/`](../../examples/generate)** – Text generation examples, including multimodal models.
+- **[`pooling/`](../../examples/pooling)** – Examples for embedding, classification, scoring, reward, etc.
+- **[`speech_to_text/`](../../examples/speech_to_text)** – Speech transcription, translation and real-time audio examples.
+- **[`features/`](../../examples/features)** – Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
+- **[`reasoning/`](../../examples/reasoning)** – Examples for reasoning with vLLM.
+- **[`tool_calling/`](../../examples/tool_calling)** – Examples for function/tool calling with vLLM.
+- **[`applications/`](../../examples/applications)** – Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
+- **[`rl/`](../../examples/rl)** – Reinforcement learning examples.
+- **[`deployment/`](../../examples/deployment)** – Examples for deploying vLLM in production.
+- **[`ray_serving/`](../../examples/ray_serving)** – Scalable serving using Ray.
+- **[`disaggregated/`](../../examples/disaggregated)** – Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
+- **[`observability/`](../../examples/observability)** – Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).