mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-13 06:23:57 +08:00
update docs for 0.20.0rc2 Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
39 lines
1.0 KiB
ReStructuredText
39 lines
1.0 KiB
ReStructuredText
=======================================================
|
|
LLM Examples Introduction
|
|
=======================================================
|
|
|
|
Here is a simple example to show how to use the LLM with TinyLlama.
|
|
|
|
.. literalinclude:: ../../../examples/llm-api/quickstart_example.py
|
|
:language: python
|
|
:linenos:
|
|
|
|
The LLM API can be used for both offline or online usage. See more examples of the LLM API here:
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
:caption: LLM API Examples
|
|
|
|
llm_inference_async
|
|
llm_inference_kv_events
|
|
llm_inference_customize
|
|
llm_lookahead_decoding
|
|
llm_medusa_decoding
|
|
llm_guided_decoding
|
|
llm_logits_processor
|
|
llm_quantization
|
|
llm_inference
|
|
llm_multilora
|
|
llm_inference_async_streaming
|
|
llm_inference_distributed
|
|
llm_eagle_decoding
|
|
llm_auto_parallel
|
|
llm_mgmn_llm_distributed
|
|
llm_mgmn_trtllm_bench
|
|
llm_mgmn_trtllm_serve
|
|
|
|
For more details on how to fully utilize this API, check out:
|
|
|
|
* `Common customizations <customization.html>`_
|
|
* `LLM API Reference <../llm-api/index.html>`_
|