Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
* add MNNVL memory mapping support
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add more MPI environment for trtllm-llmapi-launch
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add MoE communication and prepare kernels
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add MNNVL AlltoAll support for DeepSeekV3
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add output dump for throughput benchmark
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* support dynamic kernel launch grid
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* address review comments
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* address review comments #2
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
---------
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>