Kaiyu Xie
5a5427f86e
blog: Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP) ( #4958 )
...
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-05 22:24:04 +08:00
Tao Li @ NVIDIA
3b7120d60e
DeepSeek R1 throughut optimization tech blog for Blackwell GPUs ( #4791 )
...
Signed-off-by: Tao Li
2025-05-30 18:54:19 +08:00
Fanrong Li
862bde99b6
draft[doc]: add mtp tech blog ( #4580 )
...
* add mtp tech blog.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update figure size.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* update the figure caption style and add some code/pr links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure captions.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix figure size and perf data.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix based on comments
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
* fix figure links.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-05-23 13:54:21 +08:00
Kefeng-Duan
f5b6d453aa
doc: DS r1 min latency blog ( #4386 )
...
* add best perf practice on DSR1
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* add ds-r1 min latency tech blog
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* rm redundant doc
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* relative path for images
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine precommit
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* pr4280 is merged
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-05-16 20:20:28 +08:00
石晓伟
850b6fa1e7
Update TensorRT-LLM ( #1358 )
...
Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>
2024-03-26 20:47:14 +08:00
juney-nvidia
a40dbae30d
Doc update 20240130 ( #1009 )
...
* doc updates
2024-01-31 03:40:22 +08:00
石晓伟
e093e48459
Update latest news ( #549 )
2023-12-04 22:04:00 +08:00
石晓伟
ec769d63f9
Add Latest News section ( #365 )
2023-11-13 20:56:22 +08:00
石晓伟
24cf8de078
Add Latest News section ( #362 )
...
Co-authored-by: Shi Xiaowei <xiaoweis@nvidia.com>
2023-11-13 15:17:23 +08:00
石晓伟
cd6bbab0b3
Add Latest News section ( #315 )
2023-11-08 15:04:33 +08:00