akii96
|
4200f62147
|
[ROCm][GPT-OSS] Fuse RoPE + static Q FP8 quant on fused RoPE+KV path (#42832)
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-05 16:22:19 -05:00 |
|
Harry Mellor
|
ef0df7dbd6
|
[CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:56:27 +00:00 |
|
Jiahan Chang (Cyrus)
|
d0975a4b50
|
[perf] Add gemma RMS AR fusion (#42646)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2026-06-04 01:33:59 -07:00 |
|
Oxana Korzh
|
b4b4aaa70e
|
[Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* custom ops (#42129)
Signed-off-by: Oxana Korzh <okorzh@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-06-04 00:03:52 -05:00 |
|
JartX
|
48c0d13e65
|
[ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-06-01 19:09:01 -05:00 |
|
rasmith
|
9769e2df2a
|
[AMD][CI][BugFix] Fix Distributed Compile Unit Tests (2xH100-2xMI300) group (#43120)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-05-28 14:39:01 -07:00 |
|
Angela Yi
|
0fa3114ae1
|
Fix test_aot_compile for torch 2.12 (#43695)
Signed-off-by: Angela Yi <yiangela7@gmail.com>
|
2026-05-26 23:12:49 -04:00 |
|
Wentao Ye
|
33d7cbe02c
|
[Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-23 16:37:24 -07:00 |
|
TJian
|
46f95b2ec2
|
[ROCm][Critical] Fix the GDN import bug (#43486)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-23 21:12:58 +00:00 |
|
Charlie Fu
|
4cfcc0866f
|
[CI][ROCm] Remove unsupported cases in test_fusion.py (#38680)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-05-14 17:37:18 -04:00 |
|
Tres
|
f887aa1a53
|
[Aiter][ROCm] RMSNormGated+GroupedQuantFP8 fusion (#40710)
Signed-off-by: Tres Popp <tres.popp@amd.com>
Signed-off-by: Tres Popp <trespopp@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-14 15:37:09 -04:00 |
|
Michael Goin
|
8efd508204
|
[Quantization] Rework quantization_config to use QuantKey and allow for activation override (#41566)
|
2026-05-13 16:58:32 -04:00 |
|
frida-andersson
|
a721315488
|
[ROCm][Perf] Fix RMSNorm+Quant fusion for gfx950 (non-fnuz) (#41825)
Signed-off-by: Frida Andersson <fanderss@amd.com>
Signed-off-by: Chuan Li <chuali@amd.com>
Co-authored-by: Markus Hartikainen <markus.hartikainen@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Chuan Li <chuali@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Frida Andersson <frida-andersson@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-05-11 15:00:51 -04:00 |
|
Rohan Potdar
|
a51376b3f0
|
[Performance][DSR1]: Fused RoPE+KVCache+q_concat for MLA (#40392)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2026-05-11 14:10:50 +00:00 |
|
Mohammad Miadh Angkad
|
efd0e7789d
|
Fix mypy failure on main (#42197)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
|
2026-05-10 07:55:57 +00:00 |
|
Wang Xingran
|
0b272a6e01
|
[Bugfix] Fix SP pass for multimodal models and PP+SP residual handling (#33322)
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com>
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
|
2026-05-09 19:44:16 -07:00 |
|
baonudesifeizhai
|
bc5fdc1e6a
|
Add NVFP4 all-gather GEMM fusion for AsyncTP (#41882)
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-10 01:13:22 +00:00 |
|
haosdent
|
e934e459e6
|
[CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore (#42037)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-05-09 11:53:15 +08:00 |
|
haosdent
|
5f6a02812a
|
[CI][Bugfix] Fix failure CI step "PyTorch Fullgraph Smoke Test" (#41953)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-05-07 19:41:56 -07:00 |
|
Lucas Kabela
|
213f10bfdd
|
[Bugfix] Fix codegen for unqualified names (#40726)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-05-06 01:11:37 -07:00 |
|
Luka Govedič
|
d58c42e19c
|
[vLLM IR] 2/N fused_add_rms_norm and maybe_inplace overload (#36823)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-05-01 23:41:15 -04:00 |
|
vllmellm
|
529c671e80
|
[ROCm][FEAT] AITER Fused Allreduce + RMSNorm (#37646)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
Signed-off-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
Co-authored-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-05-01 23:07:18 +08:00 |
|
baonudesifeizhai
|
c3868bbbe4
|
[compile] Add FlashInfer FP8 async TP fusion and preserve allreduce fusion ordering #27893 (#39505)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
|
2026-05-01 05:08:34 +00:00 |
|
Laith Sakka
|
6f20f81cbf
|
Replace shape_invariants with simpler apprach in dynamic_arg_dims utilizing shape_id property. (#36194)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-04-29 18:32:15 +00:00 |
|
Wei Zhao
|
8b49cf3a37
|
[Bugfix] Fix max_num_batched_token not captured in cuda graph (#40734)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: Wei Zhao (Engrg-Hardware 1) <weizha@login-bia02.bia.clusters.nvidia.com>
|
2026-04-28 21:33:06 -07:00 |
|
Nick Hill
|
e68fa1b90a
|
[Core] Account for num_gpu_blocks_override in max_model_len checks (#41069)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-28 15:44:09 -07:00 |
|
Angela Yi
|
03aeed802f
|
[Test] Fix test_dynamic_shapes_compilation for torch 2.12 (#40743)
Signed-off-by: Angela Yi <angelayi@meta.com>
|
2026-04-27 17:51:15 -07:00 |
|
Yifan Qiao
|
4d51588e23
|
[Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-04-26 18:31:08 -07:00 |
|
Xinan Miao
|
32e45636e3
|
[torch.compile]: Disable Sequence Parallelism (SP) for piecewise compilation (#38373)
Signed-off-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: Xinan Miao <1403572259@qq.com>
Co-authored-by: SouthWest7 <am1ao@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Wang Xingran <72983099+wangxingran222@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-26 17:44:42 +00:00 |
|
Andreas Karatzas
|
e54894fc85
|
[ROCm][CI] Fix TestSiluMulGroupFp8QuantModel after W8A8 block linear refactor (#39799)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-25 11:20:59 +09:00 |
|
Lucas Kabela
|
ce6a199ecc
|
[BE][Bugfix] Respect TORCH_COMPILE_DISABLE env var at the vLLM config level for torch 2.12 (#40715)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-04-24 16:25:03 -07:00 |
|
Zhang Jian
|
8825608205
|
[Bugfix][CI] Fix wrong residual shape in TestFusedAddRMSNorm.example_inputs that causes flaky test (#40629)
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
|
2026-04-24 16:40:07 -04:00 |
|
Richard Zou
|
424033f4fc
|
[Bugfix] Include inductor and functorch configs in compilation cache key (#40627)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-04-23 09:52:59 -04:00 |
|
BadrBasowid
|
2196bac135
|
[Compilation] Refactor SiluMul activation+quant Fusion Pass (#39684)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
|
2026-04-23 09:10:36 -04:00 |
|
Lucas Kabela
|
b8401a9bf4
|
[Bugfix] Fix RMS norm + quant fusion on DeepGEMM UE8M0 path for B200 (#40552)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-04-22 22:04:42 +00:00 |
|
Angela Yi
|
eb6661d522
|
Fix test_startup.py for torch 2.12 (#40636)
Signed-off-by: Angela Yi <yiangela7@gmail.com>
|
2026-04-22 19:31:41 +00:00 |
|
Carl Y
|
4254aeb56f
|
[fix] flaky test_mla_attn_quant_fusion.py (#40530)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-22 06:29:58 +00:00 |
|
Carl Y
|
4506319a28
|
[compile] mla + group fp8 fusion (#38877)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-21 23:16:58 -04:00 |
|
Rita Brugarolas
|
fb5635d3f9
|
[ROCm] Add MLA dual RMS norm fusion (Q, KV) pass for DeepSeek/Kimi-K2 (#39242)
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
|
2026-04-20 14:56:27 +00:00 |
|
Shinichi Hemmi
|
4c47710bf7
|
[CI/Build] Apply ruff formatter to pass pre-commit (#40078)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
|
2026-04-17 08:54:32 +08:00 |
|
BadrBasowid
|
29057d3bee
|
[Compilation] Add Unit Tests for VllmFusionPatternMatcherPass (#39692)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
|
2026-04-16 22:57:16 +00:00 |
|
Chauncey
|
db8d4a4a06
|
[BugFix][Graph] fix: handle empty sym_shape_indices in PiecewiseBackend. (#39395)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-04-15 09:28:09 -04:00 |
|
wliao2
|
3abf858443
|
[Test] Refactor hard coded device string in test files under compile/quantization/models/model_executor folders (#38901)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
|
2026-04-15 11:02:35 +08:00 |
|
Animesh Jain
|
f00c5539d7
|
[compile] Bug fix for _decompose_size_nodes (#38360)
Signed-off-by: Animesh Jain <anijain@umich.edu>
|
2026-04-12 20:20:24 +00:00 |
|
yzong-rh
|
e816a8811f
|
[Bugfix] Fix FlashInfer crash with kv_cache_dtype_skip_layers (#39002)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-04-10 18:50:47 +00:00 |
|
Richard Zou
|
f44afef6d6
|
[compile] Allow strings in custom ops without regressing compilation times (#38123)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-04-10 07:26:37 +00:00 |
|
Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Lucas Wilkinson
|
70406eb1dc
|
[Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-07 17:14:58 -04:00 |
|
shunting314
|
8b141ed8c3
|
full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
|
2026-04-02 21:15:01 -07:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|