TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Vivian Chen 34212e2e36 [TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (#5554 ) Signed-off-by: Vivian Chen <140748220+xuanzic@users.noreply.github.com>		2025-06-30 21:34:42 -07:00
..
disaggregated_serving	[nvbugs/5309940] Add support for input output token counts (#5445 )	2025-06-28 04:39:39 +08:00
gpt	Move Triton backend to TRT-LLM main (#3549 )	2025-05-16 07:15:23 +08:00
inflight_batcher_llm	[nvbugs/5309940] Add support for input output token counts (#5445 )	2025-06-28 04:39:39 +08:00
llmapi/tensorrt_llm	[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (#5554 )	2025-06-30 21:34:42 -07:00
multimodal	chore: Change the type annotations of input_ids and position_ids to int32. (#4632 )	2025-06-07 16:10:47 +08:00
tests	[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (#5554 )	2025-06-30 21:34:42 -07:00
whisper/whisper_bls	Move Triton backend to TRT-LLM main (#3549 )	2025-05-16 07:15:23 +08:00