Void
|
316e5c3be3
|
feat: fix and improve allreduce and fusion kernels (#3064)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-04-08 19:33:52 +08:00 |
|
Yibin Li
|
32ae1564bd
|
update FP4 quantize layout (#3045)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-04-03 13:13:54 -04:00 |
|
Zongfei Jing
|
c7548ad72c
|
perf: Add optimizations for deepseek in min latency mode (#3093)
* Add optimizations for deepseek min latency
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Update internal cutlass kernel libs
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Format code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Resolve conflicts
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-04-02 09:05:24 +08:00 |
|
Kaiyu Xie
|
9b931c0f63
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|