TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yechan Kim 5460d18b10 feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>		2025-04-19 05:01:28 +08:00
..
__init__.py	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00
chat_utils.py	feat: trtllm-serve multimodal support (#3590 )	2025-04-19 05:01:28 +08:00
openai_disagg_server.py	feat: Disaggregated router class (#3584 )	2025-04-19 00:34:12 +08:00
openai_protocol.py	feat: Add support of chat completion in PD (#2985 )	2025-04-11 17:53:28 +08:00
openai_server.py	feat: trtllm-serve multimodal support (#3590 )	2025-04-19 05:01:28 +08:00
postprocess_handlers.py	chore: Unify Python NVTX call (#3450 )	2025-04-15 23:25:36 +08:00
router.py	feat: Disaggregated router class (#3584 )	2025-04-19 00:34:12 +08:00