Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder ( #7233 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Grzegorz Kwasniewski
2101d46d68
[TRTLLM-6342][feat] TP Sharding read from the model config ( #6972 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-25 15:41:27 -07:00
Lucas Liebenwein
97d550b4ba
[None] [AutoDeploy] canonicalize_graph before shape prop for consistent state_dict ( #7223 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-08-25 16:59:57 -04:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI ( #7179 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
Fridah-nv
e18dacc931
[ #4403 ][refactor] Move fusion, kvcache, and compile to modular inference optimizer ( #7057 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-21 10:30:36 -07:00
Fridah-nv
647a52698a
[ https://nvbugs/5443039 ][fix] Fix AutoDeploy pattern matcher for torch 2.8 ( #7076 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-21 01:14:51 -04:00
Fridah-nv
c02592d051
[None][autodeploy] Add group attention pattern for solar-pro-preview ( #7054 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-19 18:57:09 -04:00
Shunkangz
54ec2c1af1
[None][opt] Add batch wait timeout in fetching requests ( #6923 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-19 03:50:08 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy ( #6764 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy ( #6641 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Fridah-nv
0dc4b4e699
[ #4403 ][autodeploy] Refactor: Move more transformations to new inf optimizer, Add quantization_source to factory interface ( #6760 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-11 22:02:46 -07:00
Gal Hubara-Agam
3c5aec19c2
[ #5048 ][enhance] AutoDeploy: Optimize prepare_inputs ( #6634 )
...
Optimize prepare_inputs routine in AutoDeploy, as part of the effort to reduce the performance gap compared to the default backend.
This PR includes two major fixes, and some other minor tweaks:
1. Avoid back and forth data copies
2. Optimize position ids update by separating the implementation for generation mode and context mode.
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-10 13:55:04 +03:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Lucas Liebenwein
5247df6ae2
[AutoDeploy] merge feat/ad-2025-07-22 ( #6520 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Agam <ghubaraagam@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: haoguo <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Gal Agam <ghubaraagam@cw-dfw-h100-004-328-012.cm.cluster>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-01 08:51:08 -07:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 ( #6196 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
amitz-nv
98428f330e
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction ( #5616 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-20 08:00:14 +03:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
dfce61f4b9
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler ( #5751 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 17:05:14 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
Lucas Liebenwein
24ac9b5f69
[AutoDeploy] merge feat/ad-2025-06-29 ( #5737 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-07-04 10:21:18 +09:00
liji-nv
c345f5876c
[feat] Support torch compile for attention dp ( #5086 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-01 13:48:52 -04:00
Netanel Haber
6ee94c7ac8
Reintroduce with perf fixes: feature: unify new_tokens format sample state to trtllm samper tokens format ( #5513 )
...
58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.
The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-30 11:58:59 -07:00
Wei-Ming Chen
f28cd3056e
feat: AutoDeploy fp8 quantization support for bmm ( #3849 )
...
Signed-off-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
2025-06-30 12:36:34 -04:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Lucas Liebenwein
619709fc33
[AutoDeploy] merge feat/ad-2025-06-13 ( #5556 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-29 03:52:14 +08:00
Daniel Stokes
83a1f60556
feat: Expose bias and FP8_MXFP4 MOE CUTLASS backend features to pytorch ( #5410 )
...
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-06-27 12:29:34 +08:00
Netanel Haber
6aef14943c
Revert "feature: unify new_tokens format sample state to trtllm samper new_tokens format ( #4401 )" ( #5474 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-06-25 20:56:04 -07:00
Lucas Liebenwein
5cffb7e0ec
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch ( #5454 )
...
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-25 09:30:13 +08:00
QI JUN
d93a5e04b5
Chore: remove unused variables ( #5314 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-24 22:27:32 +08:00
Netanel Haber
58a8a8fd37
feature: unify new_tokens format sample state to trtllm sampler new_tokens format ( #4401 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-06-23 10:38:37 -07:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Robin Kobus
38547b92f3
refactor: Introduce ResourceManagerType enum for resource management ( #5246 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-18 09:55:59 +02:00
Enwei Zhu
4b82b8b4c7
[TRTLLM-5330] perf: Optimize MoE supplementary kernels for large-scale EP ( #5215 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-17 15:23:24 +08:00
Tracin
ef3fdc8051
feat: Add w4a8_mxfp4_fp8 quantization recipe. ( #4867 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-06-16 11:30:57 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
Lucas Liebenwein
49d7268acc
[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade ( #5106 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-12 13:07:27 +08:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce ( #4635 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
Lucas Liebenwein
7ddc4d6282
[AutoDeploy] Merge Feature Branch Week 3 ( #5054 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-11 00:20:43 +08:00
Bo Li
f414a079ad
chore: Change the type annotations of input_ids and position_ids to int32. ( #4632 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-07 16:10:47 +08:00
Lucas Liebenwein
743fb0a159
[AutoDeploy] _AutoDeployLlmArgs as primary config object ( #4891 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 17:20:55 +08:00
hlu1
320195dc0d
[Architecture] Refactor FusedMoE ( #4790 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-06-03 14:02:19 +08:00
Lucas Liebenwein
491a09b0c6
[AutoDeploy] Increased Model Coverage Mass Migration Week 2 ( #4817 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-06-01 14:40:29 +08:00
Daniel Cámpora
69c7fe8905
[TRTLLM-4987][feat] Partial support of context logits in TRTLLMSampler ( #4538 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-01 03:32:43 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Lucas Liebenwein
5cdd6bb10f
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 ( #4468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:43:15 +08:00
Lucas Liebenwein
de409e8468
[AutoDeploy] HF factory improvements ( #4371 )
...
* [AutoDeploy] HF factory improvements
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* improve monkey-patches and add unit tests
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-19 20:13:43 -07:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible ( #3439 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00