Commit Graph

6 Commits

Author SHA1 Message Date
Yechan Kim
c6e2111f4e
feat: enhance trtllm serve multimodal (#3757)
* feat: enhance trtllm serve multimodal

1. made the load_image and load_video asynchronous
2. add image_encoded input support to be compatible with genai-perf
3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL)

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix bandit

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* trimming uils

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* trimming for test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* genai perf command fix

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* command fix

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* refactor chat_utils

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* stress test genai-perf command

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-05-15 16:16:31 -07:00
dominicshanshan
404fbe9b32
[https://nvbugs/5277113][fix]genai-perf API change stress test (#4300)
* fix bug 5277113.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* fix bug 5277113 and 5278517.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

* fix test

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

---------

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support (#3590)
* feat: trtllm-serve multimodal support

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable argument

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add and separate tests and move the doc

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove block_resue arg from serve.py

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
Pengyun Lin
1899e71364
doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc (#3407)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-04-16 00:11:58 +08:00
Pengyun Lin
f25c7cefb4
doc: refactor trtllm-serve examples and doc (#3187)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-04 11:40:43 +08:00