mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
382 B
382 B
LLM API with TensorRT Engine
A simple inference example with TinyLlama using the LLM API:
:language: python
:linenos:
For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this README.