mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
[TRTLLM-4629][doc] Add B300 & GB300 in documents (#9663)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>
This commit is contained in:
parent
07f307d131
commit
2756a0da60
@ -123,12 +123,13 @@ The language component decides which quantization methods are supported by a giv
|
||||
| Model | NVFP4 | MXFP4 | FP8(per tensor)| FP8(block scaling) | FP8(rowwise) | FP8 KV Cache | NVFP4 KV Cache | W4A8 AWQ | W4A16 AWQ | W4A8 GPTQ | W4A16 GPTQ |
|
||||
| :------------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :-------: | :-------: | :--------: | :--------: |
|
||||
| Blackwell(sm120) | Y | Y | Y | . | . | Y | . | . | . | . | . |
|
||||
| Blackwell(sm100) | Y | Y | Y | Y | . | Y | Y | . | . | . | . |
|
||||
| Blackwell(sm100/103) | Y | Y | Y | Y | . | Y | Y | . | . | . | . |
|
||||
| Hopper | . | . | Y | Y | Y | Y | . | Y | Y | Y | Y |
|
||||
| Ada Lovelace | . | . | Y | . | . | Y | . | Y | Y | Y | Y |
|
||||
| Ampere | . | . | . | . | . | Y | . | . | Y | . | Y |
|
||||
|
||||
```{note}
|
||||
FP8 block wise scaling GEMM kernels for sm100 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
|
||||
FP8 block wise scaling GEMM kernels for sm100/103 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
|
||||
```
|
||||
|
||||
|
||||
|
||||
@ -132,6 +132,7 @@ In addition, older architectures can have limitations for newer software release
|
||||
- TensorRT-LLM requires Linux x86_64 or Linux aarch64.
|
||||
* - GPU Model Architectures
|
||||
-
|
||||
- [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
|
||||
- [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/)
|
||||
- [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
|
||||
- [NVIDIA Blackwell Architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)
|
||||
@ -158,7 +159,7 @@ The following table shows the supported software for TensorRT-LLM.
|
||||
- [10.13](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
|
||||
* - Precision
|
||||
-
|
||||
- Blackwell (SM100/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
|
||||
- Blackwell (SM100/SM103/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
|
||||
- Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4
|
||||
- Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4
|
||||
- Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[^smgte89]
|
||||
|
||||
@ -42,8 +42,8 @@ Note: Support for other models may vary. Features marked "N/A" are not applicabl
|
||||
| `Llama4ForConditionalGeneration` | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Untested | N/A | Yes | Yes |
|
||||
| `GptOssForCausalLM` | Yes | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | No | N/A | Yes | Yes |
|
||||
|
||||
[^1]: Chunked Prefill for MLA can only be enabled on SM100.
|
||||
[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100 and in BF16/FP8 KV cache dtype.
|
||||
[^1]: Chunked Prefill for MLA can only be enabled on SM100/SM103.
|
||||
[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100/SM103 and in BF16/FP8 KV cache dtype.
|
||||
|
||||
|
||||
# Multimodal Feature Support Matrix (PyTorch Backend)
|
||||
|
||||
@ -55,7 +55,6 @@ TensorRT LLM strives to support the most popular models on **Day 0**.
|
||||
|
||||
TensorRT LLM supports the full spectrum of NVIDIA GPU architectures:
|
||||
- **NVIDIA Blackwell**: B200, GB200, B300, GB300, and RTX Pro 6000 SE with FP4 optimization
|
||||
- **NVIDIA Hopper**: H100, H200,GH200 with FP8 acceleration
|
||||
- **NVIDIA Ada Lovelace**: L40/L40S, RTX 40 series with FP8 acceleration
|
||||
- **NVIDIA Ampere**: A100, RTX 30 series for production workloads
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user