* Why?
Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.
* What?
This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.
The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
* support return logprob in llmapi
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
update and add test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
stability test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
* revert removal of old flag
Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
---------
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>