TensorRT-LLMs/tensorrt_llm/bench
danielafrimi edab7532dd
feat/add latency support for trtllm bench (#3730)
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-15 13:13:49 -07:00
..
benchmark feat/add latency support for trtllm bench (#3730) 2025-07-15 13:13:49 -07:00
build [TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H (#5371) 2025-07-09 11:30:15 +03:00
dataclasses [enhance] Add the ability to write a request timeline. (#5258) 2025-07-10 17:15:30 -07:00
utils Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130) 2025-06-15 18:54:04 +03:00
__init__.py Update TensorRT-LLM 2024-08-20 18:55:15 +08:00