TensorRT-LLMs/tensorrt_llm/serve
mpikulski 533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
..
scripts [https://nvbugs/5523315][fix] Fix serve benchmark test (#8255) 2025-11-03 00:30:13 -08:00
tool_parser [TRTLLM-8214][feat] Support Qwen3 tool parser (#8216) 2025-10-29 15:48:29 +08:00
__init__.py Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
chat_utils.py [TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951) 2025-11-07 17:47:35 -08:00
cluster_storage.py [TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602) 2025-10-28 17:04:53 -07:00
disagg_auto_scaling.py [TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602) 2025-10-28 17:04:53 -07:00
harmony_adapter.py [https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages (#7849) 2025-09-24 16:38:43 +08:00
metadata_server.py feat: Add integration of etcd (#3738) 2025-06-03 20:01:44 +08:00
openai_disagg_server.py [None][feat] Add opentelemetry tracing (#5897) 2025-10-27 18:51:07 +08:00
openai_protocol.py [None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127) 2025-10-27 13:12:31 -04:00
openai_server.py [TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951) 2025-11-07 17:47:35 -08:00
postprocess_handlers.py [TRTLLM-8214][feat] Support Qwen3 tool parser (#8216) 2025-10-29 15:48:29 +08:00
responses_utils.py [None][feat] perf_metrics endpoint functionality improvement (#8005) 2025-10-02 17:43:25 -07:00
router.py [TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602) 2025-10-28 17:04:53 -07:00