TensorRT-LLMs/cpp/tensorrt_llm/plugins/mixtureOfExperts
Zongfei Jing c7548ad72c
perf: Add optimizations for deepseek in min latency mode (#3093)
* Add optimizations for deepseek min latency

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Fix compile error

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Update internal cutlass kernel libs

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Format code

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Resolve conflicts

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

---------

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 09:05:24 +08:00
..
CMakeLists.txt Update TensorRT-LLM (#524) 2023-12-01 22:27:51 +08:00
mixtureOfExpertsPlugin.cpp perf: Add optimizations for deepseek in min latency mode (#3093) 2025-04-02 09:05:24 +08:00
mixtureOfExpertsPlugin.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00