TensorRT-LLMs/tensorrt_llm/serve
Yechan Kim 5460d18b10
feat: trtllm-serve multimodal support (#3590)
* feat: trtllm-serve multimodal support

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable argument

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add and separate tests and move the doc

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove block_resue arg from serve.py

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
..
__init__.py Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
chat_utils.py feat: trtllm-serve multimodal support (#3590) 2025-04-19 05:01:28 +08:00
openai_disagg_server.py feat: Disaggregated router class (#3584) 2025-04-19 00:34:12 +08:00
openai_protocol.py feat: Add support of chat completion in PD (#2985) 2025-04-11 17:53:28 +08:00
openai_server.py feat: trtllm-serve multimodal support (#3590) 2025-04-19 05:01:28 +08:00
postprocess_handlers.py chore: Unify Python NVTX call (#3450) 2025-04-15 23:25:36 +08:00
router.py feat: Disaggregated router class (#3584) 2025-04-19 00:34:12 +08:00