mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

2025-08-25 19:30:50 +08:00

LLM API with TensorRT Engine

A simple inference example with TinyLlama using the LLM API:

    :language: python
    :linenos:

For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this README.