17325 Commits

Author SHA1 Message Date
zofia 6314de8bad [XPU] [Bug] remove xpuw4a16 output size check (#44168)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-06-02 22:26:20 +08:00
IdoAtadTD c91a87f01a [BugFix] [GDN] Read linear_key_head_dim from hf_text_config for multimodal models (#43978)
Signed-off-by: IdoAtadTD <ido.atad@twodelta.com>
2026-06-02 17:17:55 +03:00
Matthew Bonanni ea0d045a05 [FlashAttention] Sync FA with upstream (#44065)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-06-02 07:15:37 -07:00
王金旭 0bdfd5eb84 [Bugfix] Vendor MiniCPMV/MiniCPMO processors to unblock Transformers v5 (#44282)
Signed-off-by: guanwei-wu <b08901019@ntu.edu.tw>
Signed-off-by: wjinxu <1299461899@qq.com>
Co-authored-by: guanwei-wu <b08901019@ntu.edu.tw>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-02 07:14:38 -07:00
TomerBN-Nvidia 0cbc48c4f9 Support ModelOpt MXFP8 non-gated MoE (#42958)
Signed-off-by: tbarnatan <tbarnatan@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-06-02 13:56:03 +00:00
Luciano Martins 2fd0e52252 [Bugfix] Fix Gemma4 startup crash with recent transformers multimodal processor (#44232)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-06-02 13:42:40 +00:00
gruner 654bd2bca4 [Bugfix] Sync block_size from EngineCore to frontend for hybrid Mamba… (#42967)
Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-06-02 13:41:00 +00:00
wang.yuqi b623f7ea95 [Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-02 06:30:21 -07:00
Shreyas Kulkarni 0eeba5eec1 Fix DFlash prefix cache corruption due to missing lookahead block (#42971)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
2026-06-02 12:06:33 +00:00
Marceli Fylcek f69ede495b [XPU][Mamba] Triton-based selective scan forward op for XPU (#43421)
Signed-off-by: Marceli Fylcek <marceli.fylcek@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-06-02 03:50:26 -07:00
Ronen Schaffer 2a2b5ca791 [KV Offload] Add on_schedule_end() hook to separate step lifecycle from event draining (#44206)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
2026-06-02 13:42:52 +03:00
Rukhaiya2004 689b0eeb9e [HARDWARE][POWER] Enable SHM communicator support for PowerPC (#43754)
Signed-off-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Signed-off-by: Rukhaiya <bibirukhaiya123@gmail.com>
Co-authored-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Co-authored-by: Akash kaothalkar <61960177+Akashcodes732@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-06-02 18:06:32 +08:00
Isotr0py f8e9c56d15 [Multimodal] Automatically select registered video loader for VLM (#44126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-02 09:09:47 +00:00
alberto e30313220c [Parser] Migrate ResponsesParser to unified Parser interface (#42977)
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
2026-06-02 08:50:05 +00:00
omerpaz95 d247a9dc13 [EC Connector] Non blocking EC Connector lookup (#41627)
Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-02 08:48:25 +00:00
Yifan Qiao 7c37096620 [Core][Refactor]: thread scheduler_block_size into KVCacheManager and KVCacheCoordinator (#44165)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-06-02 01:14:44 -07:00
Maria Guevara b817b23f7b [Rust Frontend] add --enable-request-id-headers flag support. (#43883)
Signed-off-by: Maria Guevara <kawaiiplush14@gmail.com>
2026-06-02 16:08:37 +08:00
Ronen Schaffer 93da882e73 [kv_offload] Add @override decorators to subclass method implementations (#44177)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
2026-06-02 08:07:47 +00:00
Fadi Arafeh 0b25cf4419 [CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-06-02 08:00:48 +00:00
Jiangyun Zhu dcdfe66bfa [Perf] use triton moe backend on hopper by default (#44220)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-06-02 15:52:30 +08:00
Flora Feng 68dafcca75 [Refactor] Unify reasoning + tool-call parsing behind Parser.parse() (#44267)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 15:11:42 +08:00
zhrrr 1edfd09ffd [Model Runner V2] Use actual batch max_seq_len for attn metadata (#43991)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2026-06-02 06:07:56 +00:00
zhrrr 8a9eb40808 [Model Runner V2] Support zeroing freshly allocated KV blocks for hybrid + fp8 KVCache (#43990)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2026-06-02 05:56:53 +00:00
Isotr0py f91fb2fcf3 [Bugfix] Convert Gemma4-MM ViT linear layers to vllm native impl (#43798)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
2026-06-01 21:41:16 -07:00
JooHo Lee a045c7425f [MM][CG] Profile encoder CUDA graph pool memory (#41714)
Signed-off-by: JooHo Lee <jooho414@gmail.com>
2026-06-02 12:27:34 +08:00
Chaojun Zhang a3a5a5ece5 [XPU][Bugfix] Fix per_token_group_fp8_quant missing dummy args on XPU (#43930)
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-06-02 03:09:21 +00:00
Or Ozeri 480fadab1b [BugFix][kv_offload]: Prevent offloading stale sliding window blocks (#42959)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-06-02 05:59:48 +03:00
Krishna Chaitanya 279d25f5cb [BugFix] Fix TypeError in MiniCPM-O audio feature unpadding (#38053)
Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Signed-off-by: wjinxu <1299461899@qq.com>
Signed-off-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Co-authored-by: wjinxu <1299461899@qq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Kc Balusu <kcbalusu@users.noreply.github.com>
2026-06-01 19:57:28 -07:00
Andreas Karatzas 54d0c36fff [CI] Stabilize OpenAI schema fuzzing for malformed structural tags (#44131)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-01 19:56:15 -07:00
Flora Feng 9affc17a05 [Refactor] Move unstreamed tool-arg flush from serving layer to parser (#44017)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 10:37:43 +08:00
Alec 816cc73a9b [Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
2026-06-01 19:34:05 -07:00
Micah Williamson 2588ec4f0a [ROCm] Upgrade AITER to v0.1.13.post1 (#44265)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-06-02 01:48:59 +00:00
Dao007forever d68f0b220e [Bugfix][Mooncake] Release GPU pin on failed store in MooncakeStoreConnector (#43742)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-06-01 18:29:18 -07:00
Woosuk Kwon 517e74a964 [DSV4] Refactor RoPE initialization (#44262)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-06-02 01:26:58 +00:00
JartX 48c0d13e65 [ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-01 19:09:01 -05:00
Woosuk Kwon 8c3cc98cff [DSV4] Remove unncessary classes & functions (#44246)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-06-01 14:43:00 -07:00
Nick Hill e4cbc4385d [Test][BugFix] Fix double-BOS in PD+specdec acceptance test (#44234)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-01 14:31:12 -07:00
Nick Hill 6f8b40a23f [BugFix][CI] Fix added _has_module tests (#44248)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-01 14:23:12 -07:00
Siddharth Bedekar 266b9d9c64 [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-01 15:37:30 -04:00
Xunzhuo 182c67daf1 [Rust Frontend] Support streaming generate endpoint (#43779)
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
2026-06-01 19:30:55 +00:00
Andreas Karatzas fd9e91d7e4 [ROCm][CI] Fix and stabilize EAGLE3 acceptance tests (#41294)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2026-06-01 12:40:01 -05:00
Yongye Zhu 035733515f [Kernel][DSv4] Optimize sparse FP8 compressor kernels (#44161)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2026-06-02 00:18:32 +08:00
Madeesh Kannan 023808c23d [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-06-01 10:11:35 -04:00
Wentao Ye 985c97a6a8 [Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement (#43706)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-01 09:05:21 -04:00
Chaojun Zhang bd0aecdc08 [XPU][CI] Fix test_audio_in_video flake by using module-scoped server fixture (#44146)
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
2026-06-01 11:21:36 +00:00
zzt 8796838910 [Bugfix] fix wrong partial_rotary_factor calculation for bailing_moe model. (#43770)
Signed-off-by: zzt <zengzetang.zzt@antgroup.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-06-01 02:42:49 -07:00
Will.hou de21863419 [Rust Frontend] Add InternLM2 tool parser (#43481)
Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
2026-06-01 08:58:46 +00:00
wang.yuqi 0910f7e0e1 [Frontend] Resettle generative scoring entrypoint. (#44153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-01 07:54:59 +00:00
Uranus 1f6048abe5 fix: glm5.1 pp model loading (#42944)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
2026-06-01 15:14:47 +08:00
wcy 98f1279815 [CPU][RISC-V] Add missing RVV cpu_types helpers for WNA16 (#42730)
Signed-off-by: wcy <233313160abc@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-06-01 14:56:41 +08:00