kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 16:51:11 +08:00

石晓伟 32ed92e449

Update TensorRT-LLM

Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>

2024-08-20 18:55:15 +08:00

564 B

Raw Blame History

TensorRT-LLM Benchmarks

Overview

There are currently three workflows to benchmark TensorRT-LLM:

C++ benchmarks
- The recommended workflow that uses TensorRT-LLM C++ API and can take advantage of the latest features of TensorRT-LLM.
Python benchmarks
- The Python benchmarking scripts can only benchmark the Python runtime, which do not support the latest features, such as in-flight batching.
The Python benchmarking suite
- This benchmarking suite is a current work in progress and is prone to large changes.