mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
379 B
379 B
LLM API with TensorRT Engine
A simple inference example with TinyLlama using the LLM API:
:language: python
:linenos:
For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this README.