Doc: Update invalid hugging face URLs (#5683)

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
This commit is contained in:
Linda 2025-07-03 09:37:01 +02:00 committed by GitHub
parent 2f9d0619c3
commit 14f938e510
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
30 changed files with 41 additions and 41 deletions

View File

@ -41,7 +41,7 @@ python3 prepare_dataset.py \
```
For datasets that don't have prompt key, set --dataset-prompt instead.
Take [cnn_dailymail dataset](https://huggingface.co/datasets/cnn_dailymail) for example:
Take [cnn_dailymail dataset](https://huggingface.co/datasets/abisee/cnn_dailymail) for example:
```
python3 prepare_dataset.py \
--tokenizer <path/to/tokenizer> \

View File

@ -30,7 +30,7 @@ The script accepts an argument named model_version, whose value should be `v1_7b
In addition, there are two shared files in the folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16

View File

@ -24,7 +24,7 @@ The TensorRT-LLM BLOOM implementation can be found in [tensorrt_llm/models/bloom
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16

View File

@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix

View File

@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix

View File

@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix

View File

@ -32,7 +32,7 @@ The TensorRT-LLM Deepseek-v1 implementation can be found in [tensorrt_llm/models
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.
## Support Matrix

View File

@ -34,7 +34,7 @@ The TensorRT-LLM Deepseek-v2 implementation can be found in [tensorrt_llm/models
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.
## Support Matrix

View File

@ -25,7 +25,7 @@ The TensorRT-LLM Falcon implementation can be found in [tensorrt_llm/models/falc
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16
@ -193,7 +193,7 @@ If the engines are built successfully, you will see output like (falcon-rw-1b as
### 4. Run summarization task with the TensorRT engine(s)
The `../../../summarize.py` script can run the built engines to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
```bash
# falcon-rw-1b

View File

@ -26,7 +26,7 @@ code is located in [`examples/models/contrib/gptj`](./). There is one main file:
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16
@ -238,7 +238,7 @@ python3 ../../../run.py --max_output_len=50 --engine_dir=gptj_engine --tokenizer
## Summarization using the GPT-J model
The following section describes how to run a TensorRT-LLM GPT-J model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF GPT-J model.

View File

@ -27,7 +27,7 @@ The TensorRT-LLM GPT-NeoX implementation can be found in [`tensorrt_llm/models/g
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16
@ -118,7 +118,7 @@ trtllm-build --checkpoint_dir ./gptneox/20B/trt_ckpt/int8_wo/2-gpu/ \
### 4. Summarization using the GPT-NeoX model
The following section describes how to run a TensorRT-LLM GPT-NeoX model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF GPT-NeoX model.

View File

@ -29,7 +29,7 @@ The TensorRT-LLM Grok-1 implementation can be found in [tensorrt_llm/models/grok
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* INT8 Weight-Only

View File

@ -24,7 +24,7 @@ The TensorRT-LLM InternLM example code lies in [`examples/models/contrib/internl
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16 / BF16

View File

@ -23,7 +23,7 @@ The TensorRT-LLM support for Jais is based on the GPT model, the implementation
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
The tested configurations are:

View File

@ -29,7 +29,7 @@ The TensorRT-LLM MPT implementation can be found in [`tensorrt_llm/models/mpt/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16

View File

@ -25,7 +25,7 @@ The TensorRT-LLM OPT implementation can be found in [`tensorrt_llm/models/opt/mo
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16
@ -127,7 +127,7 @@ trtllm-build --checkpoint_dir ./opt/66B/trt_ckpt/fp16/4-gpu/ \
### 4. Summarization using the OPT model
The following section describes how to run a TensorRT-LLM OPT model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF OPT model.

View File

@ -12,7 +12,7 @@ The TensorRT-LLM Skywork example code lies in [`examples/models/contrib/skywork`
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16 & BF16
@ -78,7 +78,7 @@ trtllm-build --checkpoint_dir ./skywork-13b-base/trt_ckpt/bf16 \
### 4. Summarization using the Engines
After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
```bash
# fp16

View File

@ -11,7 +11,7 @@ The TensorRT-LLM support for Smaug-72B-v0.1 is based on the LLaMA model, the imp
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
@ -43,7 +43,7 @@ trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8 \
### Run Summarization
After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
```bash
mpirun -n 8 -allow-run-as-root python ../../../summarize.py \

View File

@ -26,7 +26,7 @@ The TensorRT-LLM Command-R example code is located in [`examples/models/core/com
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix

View File

@ -81,7 +81,7 @@ trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \
We provide three examples to run inference `run.py`, `summarize.py` and `mmlu.py`. `run.py` only run inference with `input_text` and show the output.
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
`mmlu.py` runs MMLU to evaluate the model by accuracy.

View File

@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/core/glm-4
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix

View File

@ -44,7 +44,7 @@ The TensorRT-LLM GPT implementation can be found in [`tensorrt_llm/models/gpt/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16
@ -222,7 +222,7 @@ Input [Text 0]: "Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: " chef before moving to London in the early"
```
The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
By passing `--test_trt_llm` flag, the script will evaluate TensorRT-LLM engines. You may also pass `--test_hf` flag to evaluate the HF model.

View File

@ -14,7 +14,7 @@ The TensorRT-LLM InternLM2 example code lies in [`examples/models/core/internlm2
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16 / BF16

View File

@ -47,7 +47,7 @@ The TensorRT-LLM LLaMA implementation can be found in [tensorrt_llm/models/llama
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* BF16/FP16

View File

@ -20,7 +20,7 @@ The TensorRT-LLM Mamba implementation can be found in [`tensorrt_llm/models/mamb
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
@ -177,7 +177,7 @@ If `paged_state` is disabled, engine will be built with the contiguous stage cac
### 4. Run summarization task with the TensorRT engine(s)
The following section describes how to run a TensorRT-LLM Mamba model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
```bash

View File

@ -19,7 +19,7 @@ The TensorRT-LLM Nemotron implementation is based on the GPT model, which can be
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
* FP16/BF16
@ -157,7 +157,7 @@ trtllm-build --checkpoint_dir nemotron-3-8b/trt_ckpt/int4_awq/1-gpu \
### Run Inference
The `summarize.py` script can run the built engines to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
```bash
# single gpu

View File

@ -21,7 +21,7 @@ The TensorRT-LLM Phi implementation can be found in [`tensorrt_llm/models/phi/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
@ -83,7 +83,7 @@ trtllm-build \
### 3. Summarization using the Phi model
The following section describes how to run a TensorRT-LLM Phi model to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The following section describes how to run a TensorRT-LLM Phi model to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF Phi model.
As previously explained, the first step is to build the TensorRT engine as described above using HF weights. You also have to install the requirements:

View File

@ -39,7 +39,7 @@ The TensorRT-LLM Qwen implementation can be found in [models/qwen](../../../../t
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
| Model Name | FP16/BF16 | FP8 | WO | AWQ | GPTQ | SQ | TP | PP | Arch |

View File

@ -11,7 +11,7 @@ The TensorRT-LLM RecurrentGemma implementation can be found in [`tensorrt_llm/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
## Support Matrix
| Checkpoint type | FP16 | BF16 | FP8 | INT8 SQ | INT4 AWQ | TP |
@ -171,7 +171,7 @@ trtllm-build --checkpoint_dir ${UNIFIED_CKPT_2B_IT_FLAX_PATH} \
We provide three examples to run inference `run.py`, `summarize.py` and `mmlu.py`. `run.py` only run inference with `input_text` and show the output.
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
`mmlu.py` runs MMLU to evaluate the model by accuracy.

View File

@ -20,7 +20,7 @@ The TensorRT-LLM Whisper example code is located in [`examples/models/core/whisp
* [`convert_checkpoint.py`](./convert_checkpoint.py) to convert weights from OpenAI Whisper format to TRT-LLM format.
* `trtllm-build` to build the [TensorRT](https://developer.nvidia.com/tensorrt) engine(s) needed to run the Whisper model.
* [`run.py`](./run.py) to run the inference on a single wav file, or [a HuggingFace dataset](https://huggingface.co/datasets/librispeech_asr) [\(Librispeech test clean\)](https://www.openslr.org/12).
* [`run.py`](./run.py) to run the inference on a single wav file, or [a HuggingFace dataset](https://huggingface.co/datasets/openslr/librispeech_asr) [\(Librispeech test clean\)](https://www.openslr.org/12).
## Support Matrix
* FP16