mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
90 lines
2.6 KiB
ReStructuredText
90 lines
2.6 KiB
ReStructuredText
trtllm-eval
|
|
===========
|
|
|
|
About
|
|
-----
|
|
|
|
The ``trtllm-eval`` command provides developers with a unified entry point for accuracy evaluation. It shares the core evaluation logic with the `accuracy test suite <https://github.com/NVIDIA/TensorRT-LLM/tree/main/tests/integration/defs/accuracy>`_ of TensorRT LLM.
|
|
|
|
``trtllm-eval`` is built on the offline API -- LLM API. Compared to the online ``trtllm-serve``, the offline API provides clearer error messages and simplifies the debugging workflow.
|
|
|
|
The following tasks are currently supported:
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 20 25 15 15 15
|
|
|
|
* - Dataset
|
|
- Task
|
|
- Metric
|
|
- Default ISL
|
|
- Default OSL
|
|
* - CNN Dailymail
|
|
- summarization
|
|
- rouge
|
|
- 924
|
|
- 100
|
|
* - MMLU
|
|
- QA; multiple choice
|
|
- accuracy
|
|
- 4,094
|
|
- 2
|
|
* - GSM8K
|
|
- QA; regex matching
|
|
- accuracy
|
|
- 4,096
|
|
- 256
|
|
* - GPQA
|
|
- QA; multiple choice
|
|
- accuracy
|
|
- 32,768
|
|
- 4,096
|
|
* - JSON mode eval
|
|
- structured generation
|
|
- accuracy
|
|
- 1,024
|
|
- 512
|
|
|
|
.. note::
|
|
|
|
``trtllm-eval`` originates from the TensorRT LLM accuracy test suite and serves as a lightweight utility for verifying and debugging accuracy. At this time, ``trtllm-eval`` is intended solely for development and is not recommended for production use.
|
|
|
|
Usage and Examples
|
|
------------------
|
|
|
|
Some evaluation tasks (e.g., GSM8K and GPQA) depend on the ``lm_eval`` package. To run these tasks, you need to install ``lm_eval`` with:
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install -r requirements-dev.txt
|
|
|
|
Alternatively, you can install the ``lm_eval`` version specified in ``requirements-dev.txt``.
|
|
|
|
Here are some examples:
|
|
|
|
.. code-block:: bash
|
|
|
|
# Evaluate Llama-3.1-8B-Instruct on MMLU
|
|
trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct mmlu
|
|
|
|
# Evaluate Llama-3.1-8B-Instruct on GSM8K
|
|
trtllm-eval --model meta-llama/Llama-3.1-8B-Instruct gsm8k
|
|
|
|
# Evaluate Llama-3.3-70B-Instruct on GPQA Diamond
|
|
trtllm-eval --model meta-llama/Llama-3.3-70B-Instruct gpqa_diamond
|
|
|
|
The ``--model`` argument accepts either a Hugging Face model ID or a local checkpoint path. By default, ``trtllm-eval`` runs the model with the PyTorch backend; you can pass ``--backend tensorrt`` to switch to the TensorRT backend.
|
|
|
|
Alternatively, the ``--model`` argument also accepts a local path to pre-built TensorRT engines. In this case, you should pass the Hugging Face tokenizer path to the ``--tokenizer`` argument.
|
|
|
|
For more details, see ``trtllm-eval --help`` and ``trtllm-eval <task> --help``.
|
|
|
|
|
|
|
|
Syntax
|
|
------
|
|
|
|
.. click:: tensorrt_llm.commands.eval:main
|
|
:prog: trtllm-eval
|
|
:nested: full
|