Anurag Mukkara
|
d998339855
|
Raise error for PP + MTP (#3244)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
|
2025-04-03 04:45:31 +08:00 |
|
QI JUN
|
abcb0486dc
|
fix deepseek failure with pipeline parallelism (#3225)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-04-02 22:56:39 +08:00 |
|
Zongfei Jing
|
c7548ad72c
|
perf: Add optimizations for deepseek in min latency mode (#3093)
* Add optimizations for deepseek min latency
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Update internal cutlass kernel libs
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Format code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Resolve conflicts
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-04-02 09:05:24 +08:00 |
|
Jinyang Yuan
|
992d513bc6
|
feat: Optionally split MoE inputs into chunks to reduce GPU memory usage (#3104)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
|
2025-04-01 16:07:02 +08:00 |
|
QI JUN
|
9560fcd5ec
|
Chore: waive tests and fix multi-GPU tests (#3157)
* waive tests
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean up
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-03-31 16:05:45 +08:00 |
|
Mike Iovine
|
5416966ddb
|
Add initial EAGLE-3 implementation (#3035)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
|
2025-03-29 22:31:24 +08:00 |
|
Aurelien Chartier
|
3de82c41cd
|
Pytorch PP + attention DP support (#3044)
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
|
2025-03-28 00:11:19 +08:00 |
|
Fanrong Li
|
ec03159e60
|
fix: Waive twoshot to fix acc issue (#3066)
* waive twoshot to fix acc issue
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-03-27 21:38:52 +08:00 |
|
yuxianq
|
268933b5cc
|
Refactor imports inside tensorrt_llm._torch. (#3015)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-03-26 11:01:07 +08:00 |
|
Kaiyu Xie
|
2631f21089
|
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-03-23 16:39:35 +08:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
9b931c0f63
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
Kaiyu Xie
|
77d7fe1eb2
|
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
|
2025-03-04 18:44:00 +08:00 |
|
Kaiyu Xie
|
ab5b19e027
|
Update TensorRT-LLM (#2820)
|
2025-02-25 21:21:49 +08:00 |
|
Kaiyu Xie
|
2ea17cdad2
|
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
|
2025-02-18 21:27:39 +08:00 |
|
Kaiyu Xie
|
e88da961c5
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
Dan Blanaru
|
16d2467ea8
|
Update TensorRT-LLM (#2755)
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
|
2025-02-11 03:01:00 +00:00 |
|