[TRTLLM-4629][doc] Add B300 & GB300 in documents (#9663)

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-13 22:18:36 +08:00 · 2025-12-03 20:06:50 +08:00 · 2025-12-03 20:06:50 +08:00 · 2756a0da60
commit 2756a0da60
parent 07f307d131
4 changed files with 7 additions and 6 deletions
--- a/docs/source/features/quantization.md
+++ b/docs/source/features/quantization.md
@ -123,12 +123,13 @@ The language component decides which quantization methods are supported by a giv
 | Model          |  NVFP4  | MXFP4  | FP8(per tensor)| FP8(block scaling) | FP8(rowwise) | FP8 KV Cache | NVFP4 KV Cache | W4A8 AWQ  | W4A16 AWQ | W4A8 GPTQ  | W4A16 GPTQ |
 | :------------- | :---:   | :---:  | :---: | :---: | :---: | :---: | :---: | :-------: | :-------: | :--------: | :--------: |
 | Blackwell(sm120)       |   Y     |   Y    |   Y   |   .   |   .   |   Y   |   .   |     .     |     .     |     .      |     .      |
-| Blackwell(sm100)       |   Y     |   Y    |   Y   |   Y   |   .   |   Y   |   Y   |     .     |     .     |     .      |     .      |
+| Blackwell(sm100/103)       |   Y     |   Y    |   Y   |   Y   |   .   |   Y   |   Y   |     .     |     .     |     .      |     .      |
 | Hopper           |   .     |   .    |   Y   |   Y   |   Y   |   Y   |   .   |     Y     |     Y     |     Y      |     Y      |
 | Ada Lovelace          |   .     |   .    |   Y   |   .   |   .   |   Y   |   .   |     Y     |     Y     |     Y      |     Y      |
 | Ampere         |   .     |   .    |   .   |   .   |   .   |   Y   |   .   |     .     |     Y     |     .      |     Y      |
+
 ```{note}
-FP8 block wise scaling GEMM kernels for sm100 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
+FP8 block wise scaling GEMM kernels for sm100/103 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
 ```


--- a/docs/source/legacy/reference/support-matrix.md
+++ b/docs/source/legacy/reference/support-matrix.md
@ -132,6 +132,7 @@ In addition, older architectures can have limitations for newer software release
  - TensorRT-LLM requires Linux x86_64 or Linux aarch64.
 * - GPU Model Architectures
  -
+    - [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
    - [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/)
    - [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
    - [NVIDIA Blackwell Architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)
@ -158,7 +159,7 @@ The following table shows the supported software for TensorRT-LLM.
  - [10.13](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
 * - Precision
  -
-    - Blackwell (SM100/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
+    - Blackwell (SM100/SM103/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
    - Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4
    - Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4
    - Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[^smgte89]
--- a/docs/source/models/supported-models.md
+++ b/docs/source/models/supported-models.md
@ -42,8 +42,8 @@ Note: Support for other models may vary. Features marked "N/A" are not applicabl
 | `Llama4ForConditionalGeneration` | Yes               | Yes        | Yes                        | Yes                   | Yes             | No  | Yes                       | Yes                       | Yes           | Yes              | Untested       | N/A                      | Yes                   | Yes             |
 | `GptOssForCausalLM`            | Yes              | Yes         | Yes                        | Yes                   | No             | No   | Yes                       | No                        | Yes           | Yes              | No             | N/A                      | Yes                    | Yes             |

-[^1]: Chunked Prefill for MLA can only be enabled on SM100.
-[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100 and in BF16/FP8 KV cache dtype.
+[^1]: Chunked Prefill for MLA can only be enabled on SM100/SM103.
+[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100/SM103 and in BF16/FP8 KV cache dtype.


 # Multimodal Feature Support Matrix (PyTorch Backend)
--- a/docs/source/overview.md
+++ b/docs/source/overview.md
@ -55,7 +55,6 @@ TensorRT LLM strives to support the most popular models on **Day 0**.

 TensorRT LLM supports the full spectrum of NVIDIA GPU architectures:
 - **NVIDIA Blackwell**: B200, GB200, B300, GB300, and RTX Pro 6000 SE with FP4 optimization
- **NVIDIA Hopper**: H100, H200,GH200 with FP8 acceleration
 - **NVIDIA Ada Lovelace**: L40/L40S, RTX 40 series with FP8 acceleration
 - **NVIDIA Ampere**: A100, RTX 30 series for production workloads