Kefeng-Duan
67949f7c39
Update README and add benchmarking blog for DeepSeek-R1 ( #3232 )
...
- Added a new entry in the README for the published benchmarking best practices for DeepSeek-R1.
- Introduced a new blog post detailing performance benchmarking configurations and procedures for DeepSeek-R1 in TensorRT-LLM, including installation, dataset preparation, and benchmarking steps for both B200 and H200 GPUs.
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-04-10 17:00:49 +08:00
Gabriel Wu
f1655afb0d
feat: enable DeepGEMM by default ( #3341 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-08 13:58:57 +08:00
Gabriel Wu
376731013d
feat: use NVRTC for DeepGEMM JIT compilation ( #3239 )
...
* feat: use NVRTC for DeepGEMM JIT compilation
Signed-off-by: Zihua Wu
* fix: add license
Signed-off-by: Zihua Wu
* feat: store NVRTC JIT results in memory by default
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* test: set timeout to 7200
Signed-off-by: Zihua Wu
---------
Signed-off-by: Zihua Wu
2025-04-07 20:29:23 +08:00
Kaiyu Xie
385a01055c
doc: Add serving section for DS V3 document ( #3262 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-03 21:57:48 +08:00
Fanrong Li
11624a8e96
fix deepseek-v3 mtp doc. ( #3272 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-04-03 21:12:17 +08:00
musvaage
88e1c90fd0
doc: use alert formatting ( #3153 )
...
Signed-off-by: musvaage <musvaage@users.noreply.github.com>
Co-authored-by: musvaage <musvaage@users.noreply.github.com>
2025-03-31 07:30:52 +08:00
Fanrong Li
644a01cbbe
test: Add gpqa tests for DeepSeek models ( #3063 )
...
* Add gpqa accuracy test script
* Add gpqa accuracy tests
* Update DeepSeek-v3 doc
* Update qa test list
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-03-27 19:47:06 +08:00
Xiaowei Wang
d9acce72bb
doc: Update DeepSeekV3 doc ( #3052 )
...
* Update DeepGEMM and flashMLA related content
* Add single-node command for deepgemm
* Fix spelling
---------
Signed-off-by: xiaoweiw-nv <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-03-25 18:17:26 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00