Lucas Liebenwein
743fb0a159
[AutoDeploy] _AutoDeployLlmArgs as primary config object ( #4891 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 17:20:55 +08:00
Lucas Liebenwein
491a09b0c6
[AutoDeploy] Increased Model Coverage Mass Migration Week 2 ( #4817 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-06-01 14:40:29 +08:00
Lucas Liebenwein
5cdd6bb10f
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 ( #4468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:43:15 +08:00
Lucas Liebenwein
de409e8468
[AutoDeploy] HF factory improvements ( #4371 )
...
* [AutoDeploy] HF factory improvements
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* improve monkey-patches and add unit tests
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-19 20:13:43 -07:00
Lucas Liebenwein
8e4320ede5
[AutoDeploy] configurable cache resize ( #4372 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-16 10:07:09 -04:00
Suyog Gupta
b0f7522c82
[AutoDeploy]feat: Add an AutoDeploy compile backend that only calls torch.compile ( #4240 )
...
* add a torch-compile backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* readme changes
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* add torch-cudagraph backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* further enhanced compiler backends
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* further enhance readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* better specified defaults in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* fix typo in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* updated deepseek-v3 support
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* revert accidental deletion in AD Readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-16 08:38:15 +08:00
Lucas Liebenwein
be916b19e0
feat: [AutoDeploy] unfusing attention for native support ( #3668 )
...
* [AutoDeploy] unfused streamlined attention + caching
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* improved unit testing
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* reviewer feedback
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* some updates to attn_mask handling
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* updated manual benchmarking and cudagraph capture
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-02 09:06:49 +08:00
sugunav14
84fc07b011
feat: [TRTLLM-3510] DeepseekV3 support in AutoDeploy ( #3281 )
...
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-04-08 21:47:57 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00