[None][doc] Refine perf overview.md and correct the error link in per… (#8036)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
This commit is contained in:
Guoming Zhang 2025-09-28 16:14:31 +08:00 committed by GitHub
parent 4d5465a575
commit 0c47925600
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 9 additions and 11 deletions

View File

@ -8,13 +8,13 @@ Expect breaking API changes.
```
TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
- A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
- An entirely Python workflow for benchmarking.
- Ability to benchmark various flows and features within TensorRT LLM.
`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
`trtllm-bench` executes all benchmarks using `in-flight batching` -- for more information see
the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
in further detail.
@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost <max_boost_slider>
While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list
that have been validated extensively and is the same listing as seen on the
[Performance Overview](../performance/perf-overview.md) page.
[Performance Overview](./perf-overview.md) page.
- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)

View File

@ -238,15 +238,13 @@ RTX 6000 Pro Blackwell Server Edition
| 20000/2000 | 1,804 | 1,351 |
## Reproducing Benchmarked Results
> [!NOTE] The only models supported in this workflow are those listed in the table above.
```{note}
Only the models shown in the table above are supported by this workflow.
```
The following tables are references for commands that are used as part of the benchmarking process. For a more detailed
description of this benchmarking workflow, see the [benchmarking suite documentation](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html).
The following tables are references for commands that are used as part of the benchmarking process. For a more detailed description of this benchmarking workflow, see the [benchmarking suite documentation](./perf-benchmarking.md).
### Command Overview
@ -274,7 +272,7 @@ Starting with v0.19, testing was performed using the PyTorch backend - this work
### Preparing a Dataset
In order to prepare a dataset, you can use the provided [script](../../../benchmarks/cpp/prepare_dataset.py).
In order to prepare a dataset, you can use the provided [script](source:benchmarks/cpp/prepare_dataset.py).
To generate a synthetic dataset, run the following command:
```shell
@ -310,7 +308,7 @@ remain in the system longer and therefore require less requests to achieve stead
To run the benchmark with the generated data set, simply use the `trtllm-bench throughput` subcommand. The benchmarker will
run an offline maximum throughput scenario such that all requests are queued in rapid succession. You simply need to provide
a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLMApi (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](../../../tensorrt_llm/llmapi/llm_args.py)).
a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLM APIs (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](source:tensorrt_llm/llmapi/llm_args.py)).
For dense / non-MoE models: