TensorRT-LLMs/tensorrt_llm/_torch/auto_deploy
Neta Zmora 53491ffdb1
[#9023][feat] reduce AD graph optimization time for non-participating passes (#9024)
Shorten AD graph optimization by 30% (measured on Nemotron-6):

A bug in the transformation interface marked all passes as not clean, regardless of what was reported by the transformation
Fix how the optimization passes report the results of their actions. Many passes report that the graph is not clean even when they didn't participate in the optimization. Each graph cleaning invocation can take several seconds.

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-12 09:05:53 -08:00
..
compile [https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup (#8658) 2025-10-28 10:52:43 -07:00
config [None][feat] AutoDeploy: Perf improvement for mamba layers (#8991) 2025-11-11 08:27:07 -08:00
custom_ops [None][feat] AutoDeploy: Perf improvement for mamba layers (#8991) 2025-11-11 08:27:07 -08:00
distributed [None][fix] Switch AD AllReduce strategy to NCCL (#8979) 2025-11-07 06:49:44 +02:00
export [#8924][fix] Fix AutoDeploy pattern matcher for torch 2.9 (#8920) 2025-11-05 13:29:20 -08:00
models [None][fix] AutoDeploy: update nano3 accuracy test (#9061) 2025-11-11 12:26:31 -08:00
shim [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00
transform [#9023][feat] reduce AD graph optimization time for non-participating passes (#9024) 2025-11-12 09:05:53 -08:00
utils [https://nvbugs/5625972][fix] Add context manager to fix FakeTensorProp (#9047) 2025-11-10 16:25:58 -08:00
__init__.py [AutoDeploy] merge feat/ad-2025-07-07 (#6196) 2025-07-23 05:11:04 +08:00
llm_args.py [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00
llm.py [TRTLLM-9065][chore] remove PyTorchConfig completely (#8856) 2025-11-06 22:37:03 -08:00