mirror of
https://github.com/vllm-project/vllm.git
synced 2026-06-06 00:16:14 +00:00
[Docs] Reorganize examples docs. (#41082)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
This commit is contained in:
+14
-4
@@ -1,7 +1,17 @@
|
||||
# Examples
|
||||
|
||||
vLLM's examples are split into three categories:
|
||||
vLLM's examples are organized into the following categories:
|
||||
|
||||
- If you are using vLLM from within Python code, see the [Offline Inference](.) section.
|
||||
- If you are using vLLM from an HTTP application or client, see the [Online Serving](.) section.
|
||||
- For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see the [Others](.) section.
|
||||
- **[`basic/`](../../examples/basic)** – Minimal examples for offline inference and online serving.
|
||||
- **[`generate/`](../../examples/generate)** – Text generation examples, including multimodal models.
|
||||
- **[`pooling/`](../../examples/pooling)** – Examples for embedding, classification, scoring, reward, etc.
|
||||
- **[`speech_to_text/`](../../examples/speech_to_text)** – Speech transcription, translation and real-time audio examples.
|
||||
- **[`features/`](../../examples/features)** – Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
|
||||
- **[`reasoning/`](../../examples/reasoning)** – Examples for reasoning with vLLM.
|
||||
- **[`tool_calling/`](../../examples/tool_calling)** – Examples for function/tool calling with vLLM.
|
||||
- **[`applications/`](../../examples/applications)** – Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
|
||||
- **[`rl/`](../../examples/rl)** – Reinforcement learning examples.
|
||||
- **[`deployment/`](../../examples/deployment)** – Examples for deploying vLLM in production.
|
||||
- **[`ray_serving/`](../../examples/ray_serving)** – Scalable serving using Ray.
|
||||
- **[`disaggregated/`](../../examples/disaggregated)** – Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
|
||||
- **[`observability/`](../../examples/observability)** – Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).
|
||||
|
||||
Reference in New Issue
Block a user