mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-22 11:42:41 +08:00
Shorten AD graph optimization by 30% (measured on Nemotron-6): A bug in the transformation interface marked all passes as not clean, regardless of what was reported by the transformation Fix how the optimization passes report the results of their actions. Many passes report that the graph is not clean even when they didn't participate in the optimization. Each graph cleaning invocation can take several seconds. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| compile | ||
| config | ||
| custom_ops | ||
| distributed | ||
| export | ||
| models | ||
| shim | ||
| transform | ||
| utils | ||
| __init__.py | ||
| llm_args.py | ||
| llm.py | ||