mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
Add Latest News section (#365)
This commit is contained in:
parent
24cf8de078
commit
ec769d63f9
@ -33,8 +33,7 @@ For practical examples of H200's performance:
|
||||
**Max Throughput TP8:**
|
||||
an online chat agent scenario (ISL/OSL=80/200) with GPT3-175B on a full HGX (TP8) H200 is 1.6x more performant than H100.
|
||||
|
||||
<img src="media/H200launch_Llama70B_tps.png" alt="max throughput llama TP1" width="250" height="auto">
|
||||
<img src="media/H200launch_GPT175B_tps.png" alt="max throughput GPT TP8" width="250" height="auto">
|
||||
<img src="media/H200launch_tps.png" alt="max throughput llama TP1" width="500" height="auto">
|
||||
|
||||
<sub>Preliminary measured performance, subject to change.
|
||||
TensorRT-LLM v0.5.0, TensorRT v9.1.0.4. | Llama-70B: H100 FP8 BS 8, H200 FP8 BS 32 | GPT3-175B: H100 FP8 BS 64, H200 FP8 BS 128 </sub>
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 14 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 14 KiB |
BIN
docs/source/blogs/media/H200launch_tps.png
Normal file
BIN
docs/source/blogs/media/H200launch_tps.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 22 KiB |
Loading…
Reference in New Issue
Block a user