Andreas Karatzas
|
87954eb50e
|
[ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci-bake script (#36949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-02 23:43:07 -07:00 |
|
Charlie Fu
|
71df063c49
|
Enable perf_token_group_quant/_C_stable_libtorch for ROCm (#42758)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-06-02 23:23:28 -07:00 |
|
Matthew Bonanni
|
ea0d045a05
|
[FlashAttention] Sync FA with upstream (#44065)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-06-02 07:15:37 -07:00 |
|
Rukhaiya2004
|
689b0eeb9e
|
[HARDWARE][POWER] Enable SHM communicator support for PowerPC (#43754)
Signed-off-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Signed-off-by: Rukhaiya <bibirukhaiya123@gmail.com>
Co-authored-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Co-authored-by: Akash kaothalkar <61960177+Akashcodes732@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-02 18:06:32 +08:00 |
|
Fadi Arafeh
|
0b25cf4419
|
[CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-02 08:00:48 +00:00 |
|
wcy
|
98f1279815
|
[CPU][RISC-V] Add missing RVV cpu_types helpers for WNA16 (#42730)
Signed-off-by: wcy <233313160abc@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-01 14:56:41 +08:00 |
|
Jee Jee Li
|
559d6710bf
|
[PERF]MiniMax-M2 gate kernel (#38445)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>
|
2026-05-29 18:28:34 -07:00 |
|
haosdent
|
a377631d21
|
[CI] Fix AMD docker build tests (#43329)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-05-22 14:06:24 +00:00 |
|
Michael Goin
|
5774aad9c5
|
[Perf][gpt-oss] Downgrade triton_kernels to v3.5.1 (#43135)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-05-20 14:13:12 -07:00 |
|
lyd1992
|
f351455f0f
|
[CPU][RISC-V] Add RVV-optimized attention kernels for RISC-V Vector Extension (#40119)
Signed-off-by: liuyudong <liuyudong@iscas.ac.cn>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-15 12:08:23 +08:00 |
|
Matthew Bonanni
|
dcacdf9a88
|
[Attention] Sync FA with upstream (#41052)
|
2026-05-12 23:34:18 -04:00 |
|
Michael Goin
|
184577ae46
|
[Build] DeepGEMM: trim comments, add integration notes + TODOs (#42429)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-05-12 15:57:58 -07:00 |
|
Michael Goin
|
d077622d60
|
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython (#41516)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-12 10:27:29 -04:00 |
|
Richard Barnes
|
d6563d693c
|
Require C++20 for compatibility with PyTorch (#40380)
Signed-off-by: Richard Barnes <rbarnes@meta.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-08 22:04:43 -07:00 |
|
Li, Jiang
|
b3945cc316
|
[CPU] Bump up to the latest CPU kernels (#41924)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-05-07 05:45:59 -07:00 |
|
Tianmu Li
|
e87e09a50a
|
[Feat] dnnl build for AVX2 W8A8 Int8 (#41318)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-05-06 15:28:02 +08:00 |
|
lyd1992
|
aee190ac37
|
[Build] Fall back to system libgomp when torch has no vendored copy (#40575)
Signed-off-by: liuyudong <liuyudong@iscas.ac.cn>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-06 11:42:03 +08:00 |
|
Michael Goin
|
0a9362d6ab
|
Revert "[Build] Make bundled DeepGEMM wheel portable across Python versions" (#41512)
|
2026-05-02 09:42:41 -07:00 |
|
Michael Goin
|
0c99629ede
|
[Build] Make bundled DeepGEMM wheel portable across Python versions (#41476)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-05-01 14:45:03 -07:00 |
|
Yifan Qiao
|
4d51588e23
|
[Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-04-26 18:31:08 -07:00 |
|
velonica0
|
ec7aafc02a
|
[CPU][RISC-V] Support multiple RVV VLEN targets via compile-time dispatch (#39478)
Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
|
2026-04-20 14:36:59 +08:00 |
|
Li, Jiang
|
d02421a7db
|
[CPU] Refactor CPU affinity and memory management (#39781)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-17 21:01:08 +08:00 |
|
Fadi Arafeh
|
445b7093fd
|
[perf][cpu] Accelerate BF16 GELU with LUT impl on Arm CPUs (#37469)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-15 22:26:17 -07:00 |
|
Ganesh R
|
445a2a4d1a
|
feat(cpu): add CPU support for draft model speculative decoding (#32662)
Signed-off-by: R <Ganesh.R@amd.com>
|
2026-04-10 11:49:52 +08:00 |
|
Michael Goin
|
eb4205fee5
|
[UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-04-08 18:56:32 -07:00 |
|
Lain
|
e24e0a43a4
|
[Attention] relax the head dim 512 and paged kv for sm90+FA4 (#38835)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-08 18:23:18 +00:00 |
|
Matthew Bonanni
|
308cec5864
|
[FlashAttention] Symlink FA4 instead of copying when using VLLM_FLASH_ATTN_SRC_DIR (#38814)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-04-08 12:04:34 +00:00 |
|
Lucas Wilkinson
|
cb3935a8fc
|
[FA4] Update flash-attention to latest upstream FA4 (#38690)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-02 17:02:37 +00:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Johnny
|
97d19197bc
|
[NVIDIA] Fix DGX Spark logic (#38126)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com>
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Sathish Sanjeevi <SKPsanjeevi@users.noreply.github.com>
Co-authored-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-27 15:26:07 -07:00 |
|
RobTand
|
4a76ad12e0
|
[Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell (#37725)
Signed-off-by: Rob Tand <robert.tand@icloud.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-03-25 08:18:25 -07:00 |
|
Jonas M. Kübler
|
77d2a5f17b
|
pick up tuned prefill configs for FP8 FA3 (#36265)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-17 07:00:26 -07:00 |
|
Li, Jiang
|
092ace9e3a
|
[UX] Improve UX of CPU backend (#36968)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-14 09:27:29 +08:00 |
|
Matthew Bonanni
|
f444c05c32
|
[Attention] Use FA4 for MLA prefill (#34732)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-12 12:10:17 -04:00 |
|
typer-J
|
4184653775
|
feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-03-10 21:51:39 -07:00 |
|
Nikhil Gupta
|
0a49676fb0
|
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2026-03-06 03:48:59 +00:00 |
|
Lucas Wilkinson
|
f44d1ddc8c
|
[BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-02 21:58:16 -08:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Ma Jian
|
90805ff464
|
[CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-28 04:35:21 +00:00 |
|
Lucas Wilkinson
|
bb85929aa6
|
[BugFix] Fix Python 3.13 FlashMLA import error (#34548)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-15 20:09:18 -08:00 |
|
Maryam Tahhan
|
f07a128413
|
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-02-15 06:33:08 -08:00 |
|
Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
Luka Govedič
|
e3bf79ffa0
|
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
|
2026-02-04 19:54:27 -08:00 |
|
R3hankhan
|
4dffc5e044
|
[CPU] Split attention dispatch by head_dim alignment (#32161)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-03 19:37:15 -08:00 |
|
Lucas Wilkinson
|
2267cb1cfd
|
[Attention][FA3] Update FA3 to include new swizzle optimization (#23465)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-03 08:08:47 -08:00 |
|
Maryam Tahhan
|
203d0bc0c2
|
[CPU] Improve CPU Docker build (#30953)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-01-24 17:08:24 +00:00 |
|
Fadi Arafeh
|
744ef30484
|
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-22 18:55:23 +00:00 |
|
Lucas Wilkinson
|
889722f3bf
|
[FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 22:02:39 -07:00 |
|
Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|