[None][doc] Update media files path in Skip Softmax blog. (#11540)

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
This commit is contained in:
Bo Li 2026-02-17 22:37:18 +08:00 committed by GitHub
parent 1c065fbb3e
commit ab941afa2e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -29,7 +29,7 @@ In this way, we can indirectly control the sparsity via the threshold. The thres
The method is fully dynamic, and can be applied to both the prefilling and decoding. The algorithm of Skip Softmax Attention is described in the paper [BLASST: Dynamic Blocked Attention Sparsity via Softmax Thresholding](https://arxiv.org/pdf/2512.12087). We have also published a [Developer Blog](https://developer.nvidia.com/blog/accelerating-long-context-inference-with-skip-softmax-in-nvidia-tensorrt-llm/) for explanation. Please refer to these resources for in-depth dive into the algorithm details. We will focus on the application of Skip Softmax Attention in TensorRT-LLM to accelerate long-context inference.
<p align="center">
<img src="../media/tech_blog16_blasst.jpg" alt="BLASST Illustration" style="width: 50%; min-width: 300px; display: block; margin: auto;" />
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/tech_blog16_blasst.jpg" alt="BLASST Illustration" style="width: 50%; min-width: 300px; display: block; margin: auto;" />
</p>
## Example Usage
@ -152,16 +152,16 @@ The following figures plot **speedup vs. achieved sparsity**, on top of the base
<td style="width: 50%; padding: 0 8px; vertical-align: top;">
<p align="center"><b>Hopper (H200)</b></p>
<p align="center"><b>Prefill</b></p>
<img src="../media/tech_blog16_hopper_prefill.png" alt="Hopper prefill kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/tech_blog16_hopper_prefill.png" alt="Hopper prefill kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<p align="center"><b>Decode</b></p>
<img src="../media/tech_blog16_hopper_decode.png" alt="Hopper decode kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/tech_blog16_hopper_decode.png" alt="Hopper decode kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
</td>
<td style="width: 50%; padding: 0 8px; vertical-align: top;">
<p align="center"><b>Blackwell (B200)</b></p>
<p align="center"><b>Prefill</b></p>
<img src="../media/tech_blog16_blackwell_prefill.png" alt="Blackwell prefill kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/tech_blog16_blackwell_prefill.png" alt="Blackwell prefill kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<p align="center"><b>Decode</b></p>
<img src="../media/tech_blog16_blackwell_decode.png" alt="Blackwell decode kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/tech_blog16_blackwell_decode.png" alt="Blackwell decode kernel" style="width: 100%; min-width: 280px; display: block; margin: auto;" />
</td>
</tr>
</table>