mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yechan Kim c6e2111f4e feat: enhance trtllm serve multimodal (#3757 ) * feat: enhance trtllm serve multimodal 1. made the load_image and load_video asynchronous 2. add image_encoded input support to be compatible with genai-perf 3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix bandit Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * trimming uils Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * trimming for test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * genai perf command fix Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * command fix Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * refactor chat_utils Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * stress test genai-perf command Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>		2025-05-15 16:16:31 -07:00
..
curl_chat_client_for_multimodal.sh	feat: enhance trtllm serve multimodal (#3757 )	2025-05-15 16:16:31 -07:00
curl_chat_client.sh	feat: trtllm-serve multimodal support (#3590 )	2025-04-19 05:01:28 +08:00
curl_completion_client.sh	feat: trtllm-serve multimodal support (#3590 )	2025-04-19 05:01:28 +08:00
deepseek_r1_reasoning_parser.sh	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 )	2025-05-06 08:13:04 +08:00
genai_perf_client_for_multimodal.sh	feat: enhance trtllm serve multimodal (#3757 )	2025-05-15 16:16:31 -07:00
genai_perf_client.sh	[https://nvbugs/5277113 ][fix]genai-perf API change stress test (#4300 )	2025-05-15 14:12:34 +08:00
openai_chat_client_for_multimodal.py	feat: enhance trtllm serve multimodal (#3757 )	2025-05-15 16:16:31 -07:00
openai_chat_client.py	doc: refactor trtllm-serve examples and doc (#3187 )	2025-04-04 11:40:43 +08:00
openai_completion_client.py	doc: refactor trtllm-serve examples and doc (#3187 )	2025-04-04 11:40:43 +08:00
README.md	doc: refactor trtllm-serve examples and doc (#3187 )	2025-04-04 11:40:43 +08:00
requirements.txt	doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc (#3407 )	2025-04-16 00:11:58 +08:00

README.md

Online Serving Examples with `trtllm-serve`

We provide a CLI command, trtllm-serve, to launch a FastAPI server compatible with OpenAI APIs, here are some client examples to query the server, you can check the source code here or refer to the command documentation and examples for detailed information and usage guidelines.

README.md

Online Serving Examples with trtllm-serve

Online Serving Examples with `trtllm-serve`