350 Commits

Author SHA1 Message Date
akii96 4200f62147 [ROCm][GPT-OSS] Fuse RoPE + static Q FP8 quant on fused RoPE+KV path (#42832)
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-05 16:22:19 -05:00
Harry Mellor ef0df7dbd6 [CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-05 14:56:27 +00:00
Jiahan Chang (Cyrus) d0975a4b50 [perf] Add gemma RMS AR fusion (#42646)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-06-04 01:33:59 -07:00
Oxana Korzh b4b4aaa70e [Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* custom ops (#42129)
Signed-off-by: Oxana Korzh <okorzh@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-04 00:03:52 -05:00
JartX 48c0d13e65 [ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-01 19:09:01 -05:00
rasmith 9769e2df2a [AMD][CI][BugFix] Fix Distributed Compile Unit Tests (2xH100-2xMI300) group (#43120)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-05-28 14:39:01 -07:00
Angela Yi 0fa3114ae1 Fix test_aot_compile for torch 2.12 (#43695)
Signed-off-by: Angela Yi <yiangela7@gmail.com>
2026-05-26 23:12:49 -04:00
Wentao Ye 33d7cbe02c [Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-23 16:37:24 -07:00
TJian 46f95b2ec2 [ROCm][Critical] Fix the GDN import bug (#43486)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-05-23 21:12:58 +00:00
Charlie Fu 4cfcc0866f [CI][ROCm] Remove unsupported cases in test_fusion.py (#38680)
Signed-off-by: charlifu <charlifu@amd.com>
2026-05-14 17:37:18 -04:00
Tres f887aa1a53 [Aiter][ROCm] RMSNormGated+GroupedQuantFP8 fusion (#40710)
Signed-off-by: Tres Popp <tres.popp@amd.com>
Signed-off-by: Tres Popp <trespopp@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 15:37:09 -04:00
Michael Goin 8efd508204 [Quantization] Rework quantization_config to use QuantKey and allow for activation override (#41566) 2026-05-13 16:58:32 -04:00
frida-andersson a721315488 [ROCm][Perf] Fix RMSNorm+Quant fusion for gfx950 (non-fnuz) (#41825)
Signed-off-by: Frida Andersson <fanderss@amd.com>
Signed-off-by: Chuan Li <chuali@amd.com>
Co-authored-by: Markus Hartikainen <markus.hartikainen@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Chuan Li <chuali@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Frida Andersson <frida-andersson@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-05-11 15:00:51 -04:00
Rohan Potdar a51376b3f0 [Performance][DSR1]: Fused RoPE+KVCache+q_concat for MLA (#40392)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2026-05-11 14:10:50 +00:00
Mohammad Miadh Angkad efd0e7789d Fix mypy failure on main (#42197)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2026-05-10 07:55:57 +00:00
Wang Xingran 0b272a6e01 [Bugfix] Fix SP pass for multimodal models and PP+SP residual handling (#33322)
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com>
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
2026-05-09 19:44:16 -07:00
baonudesifeizhai bc5fdc1e6a Add NVFP4 all-gather GEMM fusion for AsyncTP (#41882)
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 01:13:22 +00:00
haosdent e934e459e6 [CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore (#42037)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-05-09 11:53:15 +08:00
haosdent 5f6a02812a [CI][Bugfix] Fix failure CI step "PyTorch Fullgraph Smoke Test" (#41953)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-05-07 19:41:56 -07:00
Lucas Kabela 213f10bfdd [Bugfix] Fix codegen for unqualified names (#40726)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-05-06 01:11:37 -07:00
Luka Govedič d58c42e19c [vLLM IR] 2/N fused_add_rms_norm and maybe_inplace overload (#36823)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-05-01 23:41:15 -04:00
vllmellm 529c671e80 [ROCm][FEAT] AITER Fused Allreduce + RMSNorm (#37646)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
Signed-off-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
Co-authored-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-05-01 23:07:18 +08:00
baonudesifeizhai c3868bbbe4 [compile] Add FlashInfer FP8 async TP fusion and preserve allreduce fusion ordering #27893 (#39505)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
2026-05-01 05:08:34 +00:00
Laith Sakka 6f20f81cbf Replace shape_invariants with simpler apprach in dynamic_arg_dims utilizing shape_id property. (#36194)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2026-04-29 18:32:15 +00:00
Wei Zhao 8b49cf3a37 [Bugfix] Fix max_num_batched_token not captured in cuda graph (#40734)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: Wei Zhao (Engrg-Hardware 1) <weizha@login-bia02.bia.clusters.nvidia.com>
2026-04-28 21:33:06 -07:00
Nick Hill e68fa1b90a [Core] Account for num_gpu_blocks_override in max_model_len checks (#41069)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-04-28 15:44:09 -07:00
Angela Yi 03aeed802f [Test] Fix test_dynamic_shapes_compilation for torch 2.12 (#40743)
Signed-off-by: Angela Yi <angelayi@meta.com>
2026-04-27 17:51:15 -07:00
Yifan Qiao 4d51588e23 [Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
2026-04-26 18:31:08 -07:00
Xinan Miao 32e45636e3 [torch.compile]: Disable Sequence Parallelism (SP) for piecewise compilation (#38373)
Signed-off-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: Xinan Miao <1403572259@qq.com>
Co-authored-by: SouthWest7 <am1ao@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Wang Xingran <72983099+wangxingran222@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-26 17:44:42 +00:00
Andreas Karatzas e54894fc85 [ROCm][CI] Fix TestSiluMulGroupFp8QuantModel after W8A8 block linear refactor (#39799)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-04-25 11:20:59 +09:00
Lucas Kabela ce6a199ecc [BE][Bugfix] Respect TORCH_COMPILE_DISABLE env var at the vLLM config level for torch 2.12 (#40715)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-04-24 16:25:03 -07:00
Zhang Jian 8825608205 [Bugfix][CI] Fix wrong residual shape in TestFusedAddRMSNorm.example_inputs that causes flaky test (#40629)
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
2026-04-24 16:40:07 -04:00
Richard Zou 424033f4fc [Bugfix] Include inductor and functorch configs in compilation cache key (#40627)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-04-23 09:52:59 -04:00
BadrBasowid 2196bac135 [Compilation] Refactor SiluMul activation+quant Fusion Pass (#39684)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
2026-04-23 09:10:36 -04:00
Lucas Kabela b8401a9bf4 [Bugfix] Fix RMS norm + quant fusion on DeepGEMM UE8M0 path for B200 (#40552)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-04-22 22:04:42 +00:00
Angela Yi eb6661d522 Fix test_startup.py for torch 2.12 (#40636)
Signed-off-by: Angela Yi <yiangela7@gmail.com>
2026-04-22 19:31:41 +00:00
Carl Y 4254aeb56f [fix] flaky test_mla_attn_quant_fusion.py (#40530)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
2026-04-22 06:29:58 +00:00
Carl Y 4506319a28 [compile] mla + group fp8 fusion (#38877)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 23:16:58 -04:00
Rita Brugarolas fb5635d3f9 [ROCm] Add MLA dual RMS norm fusion (Q, KV) pass for DeepSeek/Kimi-K2 (#39242)
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
2026-04-20 14:56:27 +00:00
Shinichi Hemmi 4c47710bf7 [CI/Build] Apply ruff formatter to pass pre-commit (#40078)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
2026-04-17 08:54:32 +08:00
BadrBasowid 29057d3bee [Compilation] Add Unit Tests for VllmFusionPatternMatcherPass (#39692)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
2026-04-16 22:57:16 +00:00
Chauncey db8d4a4a06 [BugFix][Graph] fix: handle empty sym_shape_indices in PiecewiseBackend. (#39395)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-04-15 09:28:09 -04:00
wliao2 3abf858443 [Test] Refactor hard coded device string in test files under compile/quantization/models/model_executor folders (#38901)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
2026-04-15 11:02:35 +08:00
Animesh Jain f00c5539d7 [compile] Bug fix for _decompose_size_nodes (#38360)
Signed-off-by: Animesh Jain <anijain@umich.edu>
2026-04-12 20:20:24 +00:00
yzong-rh e816a8811f [Bugfix] Fix FlashInfer crash with kv_cache_dtype_skip_layers (#39002)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-04-10 18:50:47 +00:00
Richard Zou f44afef6d6 [compile] Allow strings in custom ops without regressing compilation times (#38123)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-04-10 07:26:37 +00:00
Maral 2e9034c998 [W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
2026-04-09 08:50:39 +08:00
Lucas Wilkinson 70406eb1dc [Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-04-07 17:14:58 -04:00
shunting314 8b141ed8c3 full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
2026-04-02 21:15:01 -07:00
Carl Y 1f5ec2889c [mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 21:16:11 -04:00