mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> |
||
|---|---|---|
| .. | ||
| compatibility | ||
| curl_chat_client_for_multimodal.sh | ||
| curl_chat_client.sh | ||
| curl_completion_client.sh | ||
| deepseek_r1_reasoning_parser.sh | ||
| genai_perf_client_for_multimodal.sh | ||
| genai_perf_client.sh | ||
| openai_chat_client_for_multimodal.py | ||
| openai_chat_client.py | ||
| openai_completion_client_for_lora.py | ||
| openai_completion_client_json_schema.py | ||
| openai_completion_client.py | ||
| README.md | ||
| requirements.txt | ||
Online Serving Examples with trtllm-serve
We provide a CLI command, trtllm-serve, to launch a FastAPI server compatible with OpenAI APIs, here are some client examples to query the server, you can check the source code here or refer to the command documentation and examples for detailed information and usage guidelines.