Commit Graph

17277 Commits

Author SHA1 Message Date
Tyler Michael Smith 4cc78c9d5d [Core] Freeze garbage collector in workers after model initialization (#44363)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-06-04 08:39:04 -07:00
tc-mb 3dbb4e0ace [Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count mismatches visual embedding count (#44509)
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
2026-06-04 08:22:30 -07:00
Zvi Kons b21443e23c Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 14:47:48 +00:00
Michael Goin 06ee2d8433 [Quant] Support compressed-tensors WNA8O8Int linears and WNInt embeddings (#44340)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-06-04 07:40:33 -07:00
Yongye Zhu b5235fca2e [DSv4] Adding TRTLLM gen attention kernel (#43827) 2026-06-04 07:35:09 -07:00
Andreas Karatzas 3e77036768 [ROCm][CI] Specifying time outs for the lm eval models (#44255)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-04 22:35:00 +08:00
Andreas Karatzas 6f68ca3e91 [ROCm][CI] Stabilize memory-release in the Hybrid model generation tests (#44046)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-04 22:34:24 +08:00
Turner Jabbour 0c96dd64fb [ROCm] Bump fastsafetensors to v0.3.2 from PyPI, remove git source build (#43625)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
2026-06-04 07:30:57 -07:00
Nicolò Lucchesi 68f5e565c9 [PD][Nixl] Mamba prefix caching mode support (#42554)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-06-04 06:41:46 -07:00
QiliangCui2023 9354fb1ba5 [Bugfix][Compile] Guard per_token_group_fp8_quant lookup on non-CUDA platforms (#44476) 2026-06-04 09:31:50 -04:00
Harry Mellor f35b557239 Add GH token to docs build pre run check (#44534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-04 05:43:49 -07:00
Dipika Sikka e68988a248 Refactor CT NVFP4 linear to use a single class (#42443) 2026-06-04 08:25:08 -04:00
Rui "Garry" Gao 4b87b3e845 [Bugfix] fix EVS for qwen3-vl (#44205)
Signed-off-by: Rui "Garry" Gao <garrygaogg@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-06-04 11:06:51 +00:00
wangxiyuan 90619351e3 [Attention] Mamba attention module refactor - LINEAR (#43556)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-04 18:45:29 +08:00
Jiahan Chang (Cyrus) d0975a4b50 [perf] Add gemma RMS AR fusion (#42646)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-06-04 01:33:59 -07:00
Kevin_Xiong 1bdc60ed53 Fix Kimi-K2.5 FlashInfer ViT metadata (#44493)
Signed-off-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
2026-06-04 08:14:35 +00:00
Wei Zhao a6183563b6 [Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache (#43447)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-06-04 00:48:31 -07:00
Andreas Karatzas 22c2e87555 [CI] Reverted gitignore changes (#44497)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-04 00:37:44 -07:00
wang.yuqi d01d0b4646 [Frontend] Consolidate online serving utils. (#44479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-04 06:49:31 +00:00
Oxana Korzh b4b4aaa70e [Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* custom ops (#42129)
Signed-off-by: Oxana Korzh <okorzh@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-04 00:03:52 -05:00
Andreas Karatzas 5e2af28838 [CI] Resolve release V2 docker build after ROCm CI wheels change (#44463)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-03 21:35:40 -07:00
Ilya Markov 4f423bd5bc [EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-06-04 03:40:34 +00:00
Jie Fang f0cd590d62 optimize the compressor 128 split cutedsl kernel (#44230)
Signed-off-by: Jie Fang <jief@nvidia.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
2026-06-03 20:22:57 -07:00
Wentao Ye e6018c644a [Refactor] Remove dead code in tests and parallel_state (#41471)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-03 19:32:39 -07:00
Oğuzhan KIR f25952e59b [MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
Signed-off-by: oguz <oguzhankir17@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-04 10:24:25 +08:00
maobaolong b58e082d95 [KV Connector] Update lmcache kv_offloading_backend to use LMCacheMPConnector (#42865)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
2026-06-03 19:23:55 -07:00
Ted Mostly 0c1e6f63f5 [Bugfix] Fix VLLMNotFoundError when using LoRA adapter name in poolin… (#44410)
Signed-off-by: Ted Mostly <wanghenshui@qq.com>
2026-06-04 02:22:03 +00:00
Giancarlo Delfin ceb0111a90 [Model Runner V2][Spec Decode] Add Gemma4 MTP support (#43241)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
2026-06-04 00:51:06 +00:00
Yan Ma 0414d75410 [XPU] skip unapplied UT in test_gpu_model_runner.py (#44289)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-04 08:48:17 +08:00
Dima 128adabfe0 [Bugfix] Fix Gemma4 MTP block_table batch_size mismatch under concurrent load (#43982)
Signed-off-by: Dmytro Kuntso <dkuntso@amazon.co.uk>
Co-authored-by: Dmytro Kuntso <dkuntso@amazon.co.uk>
2026-06-03 17:11:10 -07:00
dependabot[bot] bdbf08fc02 Bump actions/stale from 10.1.1 to 10.2.0 (#35078)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-03 14:14:41 -07:00
Woosuk Kwon 6bad553f4e [Minor] Remove FlashInfer version check in topk_topp_sampler (#44442)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-06-03 21:06:00 +00:00
Giancarlo Delfin 91945b6e4a [Bug Fix][Model Runner V2][Spec Decode] Warmup & capture with different attention states for speculator prefill (#44253)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
2026-06-03 13:32:40 -07:00
hoobnn 2b237c7a41 [Bugfix] Honor tool_choice="none" in Chat Completions streaming (#42752)
Signed-off-by: hoobnn <111053672+hoobnn@users.noreply.github.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>
2026-06-03 13:27:45 -07:00
Wentao Ye dad95e34d8 [Feature] Support batch invariant rms norm with residual (#42453)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-06-03 15:22:01 -04:00
Luciano Martins a248b45d05 [Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-06-03 12:01:39 -07:00
linitra24 271328e256 [LoRA] Fix dedup for post-replacement module aliases (#44413)
Signed-off-by: bk-201 <joy25810@foxmail.com>
2026-06-03 18:23:23 +00:00
Wentao Ye 2b91012650 [Refactor] Remove dead code fp quant (#44122)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-06-03 14:22:23 -04:00
JartX 5b2a2beade [ROCm][CI] Move Model Executor test step from MI250 to MI300 (gfx942) (#44370)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-03 12:23:51 -05:00
Chris Leonard 59d0236193 [10b/n] Migrate custom all-reduce, DeepSeek V4 fused MLA, MiniMax reduce-RMS, and MXFP8 MoE to libtorch stable ABI (#44365)
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
2026-06-04 00:29:46 +08:00
pschlan-amd 0a5cbf633e Handle spinloop ext load failure gracefully (#43659)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
2026-06-03 16:09:52 +00:00
Willow Lopez 51e0c579b0 fix(config): validate max_num_scheduled_tokens >= 0 on all paths (#44207)
Signed-off-by: Oxygen56 <1391083091@qq.com>
2026-06-03 16:06:45 +00:00
Mengqing Cao 0c6631f02a [KVCache] Support Pluggable KVCacheSpec (#37505)
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-03 09:05:16 -07:00
Nicolò Lucchesi df7252c343 [CI] Align PD tests to HMA on by default (#44174)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-06-04 00:04:30 +08:00
Jee Jee Li 4d1fd13613 [CI/Build] Fix LoRA testing (#44425)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-06-03 08:58:06 -07:00
Nick Hill ec8d60bea8 [Model Runner V2] Use FlashInfer sampler (#42472) 2026-06-03 07:59:31 -07:00
Chauncey 27f1d34a23 [Frontend][Responses API] Move developer-to-system conversion into HF renderer (#43590)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: kdcyberdude <kdsingh.cyberdude@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
2026-06-03 14:52:24 +00:00
Flora Feng e3e132d2dd [Refactor] Suppress SyntaxWarning from ast.literal_eval in tool parsers (#44346)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-03 10:42:19 -04:00
Xiaochang Wu e5232679a3 [XPU] Add XPU block-scaled W8A8 fp8 path (#39968)
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Yuxiang <yuxiang.liang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-06-03 20:16:19 +08:00
Xunzhuo 309385a359 [Rust Frontend] Add /server_info to Rust frontend (#43942)
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
2026-06-03 04:30:47 -07:00