From 0c479256006b4c9e274b12f38f1f96b0ae5ea543 Mon Sep 17 00:00:00 2001 From: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Date: Sun, 28 Sep 2025 16:14:31 +0800 Subject: [PATCH] =?UTF-8?q?[None][doc]=20Refine=20perf=20overview.md=20and?= =?UTF-8?q?=20correct=20the=20error=20link=20in=20per=E2=80=A6=20(#8036)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> --- docs/source/developer-guide/perf-benchmarking.md | 6 +++--- .../perf-overview.md | 14 ++++++-------- 2 files changed, 9 insertions(+), 11 deletions(-) rename docs/source/{legacy/performance => developer-guide}/perf-overview.md (97%) diff --git a/docs/source/developer-guide/perf-benchmarking.md b/docs/source/developer-guide/perf-benchmarking.md index 5b01c50601..6fcf8b64fe 100644 --- a/docs/source/developer-guide/perf-benchmarking.md +++ b/docs/source/developer-guide/perf-benchmarking.md @@ -8,13 +8,13 @@ Expect breaking API changes. ``` TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it -easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows: +easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows: - A streamlined way to build tuned engines for benchmarking for a variety of models and platforms. - An entirely Python workflow for benchmarking. - Ability to benchmark various flows and features within TensorRT LLM. -`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see +`trtllm-bench` executes all benchmarks using `in-flight batching` -- for more information see the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept in further detail. @@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list that have been validated extensively and is the same listing as seen on the -[Performance Overview](../performance/perf-overview.md) page. +[Performance Overview](./perf-overview.md) page. - [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) - [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) diff --git a/docs/source/legacy/performance/perf-overview.md b/docs/source/developer-guide/perf-overview.md similarity index 97% rename from docs/source/legacy/performance/perf-overview.md rename to docs/source/developer-guide/perf-overview.md index d354d869aa..0a144a58d4 100644 --- a/docs/source/legacy/performance/perf-overview.md +++ b/docs/source/developer-guide/perf-overview.md @@ -238,15 +238,13 @@ RTX 6000 Pro Blackwell Server Edition | 20000/2000 | 1,804 | 1,351 | - - - ## Reproducing Benchmarked Results -> [!NOTE] The only models supported in this workflow are those listed in the table above. +```{note} +Only the models shown in the table above are supported by this workflow. +``` -The following tables are references for commands that are used as part of the benchmarking process. For a more detailed -description of this benchmarking workflow, see the [benchmarking suite documentation](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html). +The following tables are references for commands that are used as part of the benchmarking process. For a more detailed description of this benchmarking workflow, see the [benchmarking suite documentation](./perf-benchmarking.md). ### Command Overview @@ -274,7 +272,7 @@ Starting with v0.19, testing was performed using the PyTorch backend - this work ### Preparing a Dataset -In order to prepare a dataset, you can use the provided [script](../../../benchmarks/cpp/prepare_dataset.py). +In order to prepare a dataset, you can use the provided [script](source:benchmarks/cpp/prepare_dataset.py). To generate a synthetic dataset, run the following command: ```shell @@ -310,7 +308,7 @@ remain in the system longer and therefore require less requests to achieve stead To run the benchmark with the generated data set, simply use the `trtllm-bench throughput` subcommand. The benchmarker will run an offline maximum throughput scenario such that all requests are queued in rapid succession. You simply need to provide -a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLMApi (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](../../../tensorrt_llm/llmapi/llm_args.py)). +a model name (HuggingFace reference or path to a local model), a [generated dataset](#preparing-a-dataset), and a file containing any desired extra options to the LLM APIs (details in [tensorrt_llm/llmapi/llm_args.py:LlmArgs](source:tensorrt_llm/llmapi/llm_args.py)). For dense / non-MoE models: