mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
Doc: Update invalid hugging face URLs (#5683)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
This commit is contained in:
parent
2f9d0619c3
commit
14f938e510
@ -41,7 +41,7 @@ python3 prepare_dataset.py \
|
||||
```
|
||||
|
||||
For datasets that don't have prompt key, set --dataset-prompt instead.
|
||||
Take [cnn_dailymail dataset](https://huggingface.co/datasets/cnn_dailymail) for example:
|
||||
Take [cnn_dailymail dataset](https://huggingface.co/datasets/abisee/cnn_dailymail) for example:
|
||||
```
|
||||
python3 prepare_dataset.py \
|
||||
--tokenizer <path/to/tokenizer> \
|
||||
|
||||
@ -30,7 +30,7 @@ The script accepts an argument named model_version, whose value should be `v1_7b
|
||||
In addition, there are two shared files in the folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
|
||||
@ -24,7 +24,7 @@ The TensorRT-LLM BLOOM implementation can be found in [tensorrt_llm/models/bloom
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
|
||||
@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
|
||||
@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
|
||||
@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
|
||||
@ -32,7 +32,7 @@ The TensorRT-LLM Deepseek-v1 implementation can be found in [tensorrt_llm/models
|
||||
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
|
||||
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
@ -34,7 +34,7 @@ The TensorRT-LLM Deepseek-v2 implementation can be found in [tensorrt_llm/models
|
||||
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
|
||||
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
@ -25,7 +25,7 @@ The TensorRT-LLM Falcon implementation can be found in [tensorrt_llm/models/falc
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
@ -193,7 +193,7 @@ If the engines are built successfully, you will see output like (falcon-rw-1b as
|
||||
|
||||
### 4. Run summarization task with the TensorRT engine(s)
|
||||
The `../../../summarize.py` script can run the built engines to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
```bash
|
||||
# falcon-rw-1b
|
||||
|
||||
@ -26,7 +26,7 @@ code is located in [`examples/models/contrib/gptj`](./). There is one main file:
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
@ -238,7 +238,7 @@ python3 ../../../run.py --max_output_len=50 --engine_dir=gptj_engine --tokenizer
|
||||
## Summarization using the GPT-J model
|
||||
|
||||
The following section describes how to run a TensorRT-LLM GPT-J model to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
The script can also perform the same summarization using the HF GPT-J model.
|
||||
|
||||
|
||||
@ -27,7 +27,7 @@ The TensorRT-LLM GPT-NeoX implementation can be found in [`tensorrt_llm/models/g
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
@ -118,7 +118,7 @@ trtllm-build --checkpoint_dir ./gptneox/20B/trt_ckpt/int8_wo/2-gpu/ \
|
||||
### 4. Summarization using the GPT-NeoX model
|
||||
|
||||
The following section describes how to run a TensorRT-LLM GPT-NeoX model to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
The script can also perform the same summarization using the HF GPT-NeoX model.
|
||||
|
||||
|
||||
@ -29,7 +29,7 @@ The TensorRT-LLM Grok-1 implementation can be found in [tensorrt_llm/models/grok
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* INT8 Weight-Only
|
||||
|
||||
@ -24,7 +24,7 @@ The TensorRT-LLM InternLM example code lies in [`examples/models/contrib/internl
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16 / BF16
|
||||
|
||||
@ -23,7 +23,7 @@ The TensorRT-LLM support for Jais is based on the GPT model, the implementation
|
||||
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
The tested configurations are:
|
||||
|
||||
@ -29,7 +29,7 @@ The TensorRT-LLM MPT implementation can be found in [`tensorrt_llm/models/mpt/mo
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
|
||||
@ -25,7 +25,7 @@ The TensorRT-LLM OPT implementation can be found in [`tensorrt_llm/models/opt/mo
|
||||
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
@ -127,7 +127,7 @@ trtllm-build --checkpoint_dir ./opt/66B/trt_ckpt/fp16/4-gpu/ \
|
||||
### 4. Summarization using the OPT model
|
||||
|
||||
The following section describes how to run a TensorRT-LLM OPT model to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
The script can also perform the same summarization using the HF OPT model.
|
||||
|
||||
|
||||
@ -12,7 +12,7 @@ The TensorRT-LLM Skywork example code lies in [`examples/models/contrib/skywork`
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16 & BF16
|
||||
@ -78,7 +78,7 @@ trtllm-build --checkpoint_dir ./skywork-13b-base/trt_ckpt/bf16 \
|
||||
|
||||
### 4. Summarization using the Engines
|
||||
|
||||
After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
|
||||
After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
|
||||
|
||||
```bash
|
||||
# fp16
|
||||
|
||||
@ -11,7 +11,7 @@ The TensorRT-LLM support for Smaug-72B-v0.1 is based on the LLaMA model, the imp
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`../../../run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
@ -43,7 +43,7 @@ trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8 \
|
||||
|
||||
### Run Summarization
|
||||
|
||||
After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
|
||||
After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
|
||||
|
||||
```bash
|
||||
mpirun -n 8 -allow-run-as-root python ../../../summarize.py \
|
||||
|
||||
@ -26,7 +26,7 @@ The TensorRT-LLM Command-R example code is located in [`examples/models/core/com
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
|
||||
@ -81,7 +81,7 @@ trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \
|
||||
|
||||
We provide three examples to run inference `run.py`, `summarize.py` and `mmlu.py`. `run.py` only run inference with `input_text` and show the output.
|
||||
|
||||
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
|
||||
`mmlu.py` runs MMLU to evaluate the model by accuracy.
|
||||
|
||||
|
||||
@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/core/glm-4
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
|
||||
@ -44,7 +44,7 @@ The TensorRT-LLM GPT implementation can be found in [`tensorrt_llm/models/gpt/mo
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
@ -222,7 +222,7 @@ Input [Text 0]: "Born in north-east France, Soyer trained as a"
|
||||
Output [Text 0 Beam 0]: " chef before moving to London in the early"
|
||||
```
|
||||
|
||||
The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
For each summary, the script can compute the
|
||||
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
By passing `--test_trt_llm` flag, the script will evaluate TensorRT-LLM engines. You may also pass `--test_hf` flag to evaluate the HF model.
|
||||
|
||||
@ -14,7 +14,7 @@ The TensorRT-LLM InternLM2 example code lies in [`examples/models/core/internlm2
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16 / BF16
|
||||
|
||||
@ -47,7 +47,7 @@ The TensorRT-LLM LLaMA implementation can be found in [tensorrt_llm/models/llama
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* BF16/FP16
|
||||
|
||||
@ -20,7 +20,7 @@ The TensorRT-LLM Mamba implementation can be found in [`tensorrt_llm/models/mamb
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
|
||||
## Support Matrix
|
||||
@ -177,7 +177,7 @@ If `paged_state` is disabled, engine will be built with the contiguous stage cac
|
||||
### 4. Run summarization task with the TensorRT engine(s)
|
||||
|
||||
The following section describes how to run a TensorRT-LLM Mamba model to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
|
||||
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
|
||||
```bash
|
||||
|
||||
@ -19,7 +19,7 @@ The TensorRT-LLM Nemotron implementation is based on the GPT model, which can be
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
* FP16/BF16
|
||||
@ -157,7 +157,7 @@ trtllm-build --checkpoint_dir nemotron-3-8b/trt_ckpt/int4_awq/1-gpu \
|
||||
### Run Inference
|
||||
|
||||
The `summarize.py` script can run the built engines to summarize the articles from the
|
||||
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
```bash
|
||||
# single gpu
|
||||
|
||||
@ -21,7 +21,7 @@ The TensorRT-LLM Phi implementation can be found in [`tensorrt_llm/models/phi/mo
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
|
||||
@ -83,7 +83,7 @@ trtllm-build \
|
||||
|
||||
### 3. Summarization using the Phi model
|
||||
|
||||
The following section describes how to run a TensorRT-LLM Phi model to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
The following section describes how to run a TensorRT-LLM Phi model to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
The script can also perform the same summarization using the HF Phi model.
|
||||
|
||||
As previously explained, the first step is to build the TensorRT engine as described above using HF weights. You also have to install the requirements:
|
||||
|
||||
@ -39,7 +39,7 @@ The TensorRT-LLM Qwen implementation can be found in [models/qwen](../../../../t
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
| Model Name | FP16/BF16 | FP8 | WO | AWQ | GPTQ | SQ | TP | PP | Arch |
|
||||
|
||||
@ -11,7 +11,7 @@ The TensorRT-LLM RecurrentGemma implementation can be found in [`tensorrt_llm/mo
|
||||
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:
|
||||
|
||||
* [`run.py`](../../../run.py) to run the inference on an input text;
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
|
||||
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
|
||||
|
||||
## Support Matrix
|
||||
| Checkpoint type | FP16 | BF16 | FP8 | INT8 SQ | INT4 AWQ | TP |
|
||||
@ -171,7 +171,7 @@ trtllm-build --checkpoint_dir ${UNIFIED_CKPT_2B_IT_FLAX_PATH} \
|
||||
|
||||
We provide three examples to run inference `run.py`, `summarize.py` and `mmlu.py`. `run.py` only run inference with `input_text` and show the output.
|
||||
|
||||
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
|
||||
|
||||
`mmlu.py` runs MMLU to evaluate the model by accuracy.
|
||||
|
||||
|
||||
@ -20,7 +20,7 @@ The TensorRT-LLM Whisper example code is located in [`examples/models/core/whisp
|
||||
|
||||
* [`convert_checkpoint.py`](./convert_checkpoint.py) to convert weights from OpenAI Whisper format to TRT-LLM format.
|
||||
* `trtllm-build` to build the [TensorRT](https://developer.nvidia.com/tensorrt) engine(s) needed to run the Whisper model.
|
||||
* [`run.py`](./run.py) to run the inference on a single wav file, or [a HuggingFace dataset](https://huggingface.co/datasets/librispeech_asr) [\(Librispeech test clean\)](https://www.openslr.org/12).
|
||||
* [`run.py`](./run.py) to run the inference on a single wav file, or [a HuggingFace dataset](https://huggingface.co/datasets/openslr/librispeech_asr) [\(Librispeech test clean\)](https://www.openslr.org/12).
|
||||
|
||||
## Support Matrix
|
||||
* FP16
|
||||
|
||||
Loading…
Reference in New Issue
Block a user