mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-08 04:01:51 +08:00
[None][doc] Refine perf overview.md and correct the error link in per… (#8036)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
This commit is contained in:
parent
4d5465a575
commit
0c47925600
@ -8,13 +8,13 @@ Expect breaking API changes.
|
||||
```
|
||||
|
||||
TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
|
||||
easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
|
||||
easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
|
||||
|
||||
- A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
|
||||
- An entirely Python workflow for benchmarking.
|
||||
- Ability to benchmark various flows and features within TensorRT LLM.
|
||||
|
||||
`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
|
||||
`trtllm-bench` executes all benchmarks using `in-flight batching` -- for more information see
|
||||
the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
|
||||
in further detail.
|
||||
|
||||
@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost <max_boost_slider>
|
||||
|
||||
While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list
|
||||
that have been validated extensively and is the same listing as seen on the
|
||||
[Performance Overview](../performance/perf-overview.md) page.
|
||||
[Performance Overview](./perf-overview.md) page.
|
||||
|
||||
- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
|
||||
- [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
|
||||
|
||||
@ -238,15 +238,13 @@ RTX 6000 Pro Blackwell Server Edition
|
||||
| 20000/2000 | 1,804 | 1,351 |
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Reproducing Benchmarked Results
|
||||
|
||||
> [!NOTE] The only models supported in this workflow are those listed in the table above.
|
||||
```{note}
|
||||
Only the models shown in the table above are supported by this workflow.
|
||||
```
|
||||
|
||||
The following tables are references for commands that are used as part of the benchmarking process. For a more detailed
|
||||
description of this benchmarking workflow, see the [benchmarking suite documentation](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html).
|
||||
The following tables are references for commands that are used as part of the benchmarking process. For a more detailed description of this benchmarking workflow, see the [benchmarking suite documentation](./perf-benchmarking.md).
|
||||
|
||||
### Command Overview
|
||||
|
||||
@ -274,7 +272,7 @@ Starting with v0.19, testing was performed using the PyTorch backend - this work
|
||||
|
||||
### Preparing a Dataset
|
||||
|
||||
In order to prepare a dataset, you can use the provided [script](../../../benchmarks/cpp/prepare_dataset.py).
|
||||
In order to prepare a dataset, you can use the provided [script](source:benchmarks/cpp/prepare_dataset.py).
|
||||
To generate a synthetic dataset, run the following command:
|
||||
|
||||
```shell
|
||||
@ -310,7 +308,7 @@ remain in the system longer and therefore require less requests to achieve stead
|
||||
|
||||
To run the benchmark with the generated data set, simply use the `trtllm-bench throughput` subcommand. The benchmarker will
|
||||
run an offline maximum throughput scenario such that all requests are queued in rapid succession. You simply need to provide
|
||||
a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLMApi (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](../../../tensorrt_llm/llmapi/llm_args.py)).
|
||||
a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLM APIs (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](source:tensorrt_llm/llmapi/llm_args.py)).
|
||||
|
||||
For dense / non-MoE models:
|
||||
|
||||
Loading…
Reference in New Issue
Block a user