mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
10 lines
379 B
Markdown
10 lines
379 B
Markdown
# LLM API with TensorRT Engine
|
|
A simple inference example with TinyLlama using the LLM API:
|
|
|
|
```{literalinclude} ../../examples/llm-api/_tensorrt_engine/quickstart_example.py
|
|
:language: python
|
|
:linenos:
|
|
```
|
|
|
|
For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this [README](../../../examples/llm-api/README.md).
|