TensorRT-LLMs/openai_completion_client.py at 90a28b917fd3e9df101a431bc49af5fc2fc715bd - TensorRT-LLMs - Gitea: Git with a cup of tea

kanshan/TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Pengyun Lin f25c7cefb4

doc: refactor trtllm-serve examples and doc (#3187 )

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

2025-04-04 11:40:43 +08:00

16 lines

292 B

Python

Raw Blame History

 ### OpenAI Completion Client
 from openai import OpenAI
 client = OpenAI(
     base_url="http://localhost:8000/v1",
     api_key="tensorrt_llm",
 )
 response = client.completions.create(
     model="TinyLlama-1.1B-Chat-v1.0",
     prompt="Where is New York?",
     max_tokens=20,
 )
 print(response)