mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
* feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| curl_chat_client_for_multimodal.sh | ||
| curl_chat_client.sh | ||
| curl_completion_client.sh | ||
| genai_perf_client.sh | ||
| openai_chat_client_for_multimodal.py | ||
| openai_chat_client.py | ||
| openai_completion_client.py | ||
| README.md | ||
| requirements.txt | ||
Online Serving Examples with trtllm-serve
We provide a CLI command, trtllm-serve, to launch a FastAPI server compatible with OpenAI APIs, here are some client examples to query the server, you can check the source code here or refer to the command documentation and examples for detailed information and usage guidelines.