Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
Lucas Liebenwein
7ddc4d6282
[AutoDeploy] Merge Feature Branch Week 3 ( #5054 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-11 00:20:43 +08:00
Lucas Liebenwein
743fb0a159
[AutoDeploy] _AutoDeployLlmArgs as primary config object ( #4891 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 17:20:55 +08:00
Lucas Liebenwein
491a09b0c6
[AutoDeploy] Increased Model Coverage Mass Migration Week 2 ( #4817 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-06-01 14:40:29 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Lucas Liebenwein
5cdd6bb10f
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 ( #4468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:43:15 +08:00
Lucas Liebenwein
8e4320ede5
[AutoDeploy] configurable cache resize ( #4372 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-16 10:07:09 -04:00
Suyog Gupta
b0f7522c82
[AutoDeploy]feat: Add an AutoDeploy compile backend that only calls torch.compile ( #4240 )
...
* add a torch-compile backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* readme changes
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* add torch-cudagraph backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* further enhanced compiler backends
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* further enhance readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* better specified defaults in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* fix typo in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* updated deepseek-v3 support
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* revert accidental deletion in AD Readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-16 08:38:15 +08:00
Lucas Liebenwein
be916b19e0
feat: [AutoDeploy] unfusing attention for native support ( #3668 )
...
* [AutoDeploy] unfused streamlined attention + caching
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* improved unit testing
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* reviewer feedback
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* some updates to attn_mask handling
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* updated manual benchmarking and cudagraph capture
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-02 09:06:49 +08:00
Lucas Liebenwein
06b914e0f9
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs ( #3589 )
...
* generalizing cudagraph to multiple dynamic inputs
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* fix for failing test
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-04-23 03:38:51 +08:00
sugunav14
84fc07b011
feat: [TRTLLM-3510] DeepseekV3 support in AutoDeploy ( #3281 )
...
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-04-08 21:47:57 +08:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. ( #3270 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
tburt-nv
7a659885e3
chore: remove usernames from comments ( #3291 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-04-05 13:44:28 +08:00
Suyog Gupta
047f2b234d
perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench ( #3041 )
...
* Enable AutoDeploy as a backend in trtllm-bench
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* update how caches are resized
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* fix: files permission from 100755 to 100644
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some comments
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Fix function name
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* refactor
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Remove spurious change
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Add cursor generated doc strings
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* re-enable ad test
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some perf cleanup
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* debug ci
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* ensure that overlap scheduler is enabled
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Reorder the tests
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
---------
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 14:33:14 -07:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00