TensorRT-LLMs/cpp/tensorrt_llm/pybind
dongxuy04 21aff2e313
feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384)
* first commit of cpp moe loadbalance code

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add python bindings for moe load balance

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add python wrapper, ut and bug fixes

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add binding for layerId and update binding test

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add host tensor sharing and ut

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

---------

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-20 17:53:48 +08:00
..
batch_manager refactor: Copy sequence lengths once in decoder setup (#4102) 2025-05-16 22:03:55 +08:00
common fix: Move all casters to customCasters. (#3945) 2025-05-02 19:08:28 +08:00
executor Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979) 2025-05-12 22:32:29 +02:00
runtime feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384) 2025-05-20 17:53:48 +08:00
testing refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
userbuffers fix: Move all casters to customCasters. (#3945) 2025-05-02 19:08:28 +08:00
bindings.cpp fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399) 2025-05-19 14:25:36 -07:00
CMakeLists.txt feat: large-scale EP(part 2: MoE Load Balancer - core utilities) (#4384) 2025-05-20 17:53:48 +08:00