Add Latest News section (#365)

This commit is contained in:
石晓伟 2023-11-13 20:56:22 +08:00 committed by GitHub
parent 24cf8de078
commit ec769d63f9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 1 additions and 2 deletions

View File

@ -33,8 +33,7 @@ For practical examples of H200's performance:
**Max Throughput TP8:**
an online chat agent scenario (ISL/OSL=80/200) with GPT3-175B on a full HGX (TP8) H200 is 1.6x more performant than H100.
<img src="media/H200launch_Llama70B_tps.png" alt="max throughput llama TP1" width="250" height="auto">
<img src="media/H200launch_GPT175B_tps.png" alt="max throughput GPT TP8" width="250" height="auto">
<img src="media/H200launch_tps.png" alt="max throughput llama TP1" width="500" height="auto">
<sub>Preliminary measured performance, subject to change.
TensorRT-LLM v0.5.0, TensorRT v9.1.0.4. | Llama-70B: H100 FP8 BS 8, H200 FP8 BS 32 | GPT3-175B: H100 FP8 BS 64, H200 FP8 BS 128 </sub>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB