mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| cpp_custom_ops.py | ||
| flashinfer_custom_ops.py | ||
| torch_custom_ops.py | ||
| userbuffers_custom_ops.py | ||