From 0c479256006b4c9e274b12f38f1f96b0ae5ea543 Mon Sep 17 00:00:00 2001
From: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Date: Sun, 28 Sep 2025 16:14:31 +0800
Subject: [PATCH] =?UTF-8?q?[None][doc]=20Refine=20perf=20overview.md=20and?=
 =?UTF-8?q?=20correct=20the=20error=20link=20in=20per=E2=80=A6=20(#8036)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
---
 docs/source/developer-guide/perf-benchmarking.md   |  6 +++---
 .../perf-overview.md                               | 14 ++++++--------
 2 files changed, 9 insertions(+), 11 deletions(-)
 rename docs/source/{legacy/performance => developer-guide}/perf-overview.md (97%)

diff --git a/docs/source/developer-guide/perf-benchmarking.md b/docs/source/developer-guide/perf-benchmarking.md
index 5b01c50601..6fcf8b64fe 100644
--- a/docs/source/developer-guide/perf-benchmarking.md
+++ b/docs/source/developer-guide/perf-benchmarking.md
@@ -8,13 +8,13 @@ Expect breaking API changes.
 ```
 
 TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
-easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
+easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
 
 - A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
 - An entirely Python workflow for benchmarking.
 - Ability to benchmark various flows and features within TensorRT LLM.
 
-`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
+`trtllm-bench` executes all benchmarks using `in-flight batching` -- for more information see
 the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
 in further detail.
 
@@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost <max_boost_slider>
 
 While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list
 that have been validated extensively and is the same listing as seen on the
-[Performance Overview](../performance/perf-overview.md) page.
+[Performance Overview](./perf-overview.md) page.
 
 - [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
 - [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
diff --git a/docs/source/legacy/performance/perf-overview.md b/docs/source/developer-guide/perf-overview.md
similarity index 97%
rename from docs/source/legacy/performance/perf-overview.md
rename to docs/source/developer-guide/perf-overview.md
index d354d869aa..0a144a58d4 100644
--- a/docs/source/legacy/performance/perf-overview.md
+++ b/docs/source/developer-guide/perf-overview.md
@@ -238,15 +238,13 @@ RTX 6000 Pro Blackwell Server Edition
 | 20000/2000 | 1,804 | 1,351 |
 
 
-
-
-
 ## Reproducing Benchmarked Results
 
-> [!NOTE] The only models supported in this workflow are those listed in the table above.
+```{note}
+Only the models shown in the table above are supported by this workflow.
+```
 
-The following tables are references for commands that are used as part of the benchmarking process. For a more detailed
-description of this benchmarking workflow, see the [benchmarking suite documentation](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html).
+The following tables are references for commands that are used as part of the benchmarking process. For a more detailed description of this benchmarking workflow, see the [benchmarking suite documentation](./perf-benchmarking.md).
 
 ### Command Overview
 
@@ -274,7 +272,7 @@ Starting with v0.19, testing was performed using the PyTorch backend - this work
 
 ### Preparing a Dataset
 
-In order to prepare a dataset, you can use the provided [script](../../../benchmarks/cpp/prepare_dataset.py).
+In order to prepare a dataset, you can use the provided [script](source:benchmarks/cpp/prepare_dataset.py).
 To generate a synthetic dataset, run the following command:
 
 ```shell
@@ -310,7 +308,7 @@ remain in the system longer and therefore require less requests to achieve stead
 
 To run the benchmark with the generated data set, simply use the `trtllm-bench throughput` subcommand. The benchmarker will
 run an offline maximum throughput scenario such that all requests are queued in rapid succession. You simply need to provide
-a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLMApi (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](../../../tensorrt_llm/llmapi/llm_args.py)).
+a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLM APIs (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](source:tensorrt_llm/llmapi/llm_args.py)).
 
 For dense / non-MoE models: