[None][doc] Refine perf overview.md and correct the error link in per… (#8036)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-02-08 04:01:51 +08:00 · 2025-09-28 16:14:31 +08:00 · 2025-09-28 16:14:31 +08:00 · 0c47925600
commit 0c47925600
parent 4d5465a575
2 changed files with 9 additions and 11 deletions
--- a/docs/source/developer-guide/perf-benchmarking.md
+++ b/docs/source/developer-guide/perf-benchmarking.md
@ -8,13 +8,13 @@ Expect breaking API changes.
 ```

 TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
-easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
+easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:

 - A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
 - An entirely Python workflow for benchmarking.
 - Ability to benchmark various flows and features within TensorRT LLM.

-`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
+`trtllm-bench` executes all benchmarks using `in-flight batching` -- for more information see
 the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
 in further detail.

@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost <max_boost_slider>

 While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list
 that have been validated extensively and is the same listing as seen on the
-[Performance Overview](../performance/perf-overview.md) page.
+[Performance Overview](./perf-overview.md) page.

 - [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
 - [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
--- a/docs/source/legacy/performance/perf-overview.md
+++ b/docs/source/legacy/performance/perf-overview.md
@ -238,15 +238,13 @@ RTX 6000 Pro Blackwell Server Edition
 | 20000/2000 | 1,804 | 1,351 |


-
-
-
 ## Reproducing Benchmarked Results

-> [!NOTE] The only models supported in this workflow are those listed in the table above.
+```{note}
+Only the models shown in the table above are supported by this workflow.
+```

-The following tables are references for commands that are used as part of the benchmarking process. For a more detailed
-description of this benchmarking workflow, see the [benchmarking suite documentation](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html).
+The following tables are references for commands that are used as part of the benchmarking process. For a more detailed description of this benchmarking workflow, see the [benchmarking suite documentation](./perf-benchmarking.md).

 ### Command Overview

@ -274,7 +272,7 @@ Starting with v0.19, testing was performed using the PyTorch backend - this work

 ### Preparing a Dataset

-In order to prepare a dataset, you can use the provided [script](../../../benchmarks/cpp/prepare_dataset.py).
+In order to prepare a dataset, you can use the provided [script](source:benchmarks/cpp/prepare_dataset.py).
 To generate a synthetic dataset, run the following command:

 ```shell
@ -310,7 +308,7 @@ remain in the system longer and therefore require less requests to achieve stead

 To run the benchmark with the generated data set, simply use the `trtllm-bench throughput` subcommand. The benchmarker will
 run an offline maximum throughput scenario such that all requests are queued in rapid succession. You simply need to provide
-a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLMApi (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](../../../tensorrt_llm/llmapi/llm_args.py)).
+a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLM APIs (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](source:tensorrt_llm/llmapi/llm_args.py)).

 For dense / non-MoE models: