5799 Commits

Author SHA1 Message Date
Flora Feng e67063826b [CI] Add missing vllm/parser/ CI trigger and fix test_parse.py (#44352)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 21:05:19 -07:00
Andreas Karatzas 53b88d1dfc [CI] Reject out-of-vocabulary before they reach the GPU logprob path (#44042)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-02 22:27:52 -05:00
JartX 7b476c8f14 [ROCm][CI] Skip fp8 reload tests on gfx90a (MI250) (#44369)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-02 22:27:14 -05:00
JartX 4454a18695 [ROCm][CI] Fix stale wvSplitK GEMM fallback test for N=5 (#44368)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-02 22:00:25 -05:00
Siddharth Bedekar 0917a009d3 Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
2026-06-02 21:51:21 +00:00
SeongJun Lee 3099de3617 [Kernel][MoE] Add GELU_TANH to CPU, CUTLASS, and WNA16 MoE backends (#42027)
Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>
Co-authored-by: lesj0610 <lesj0610@users.noreply.github.com>
2026-06-02 17:12:08 -04:00
Nick Hill e15f20258b [ModelRunnerV2] Avoid pipeline parallel bubbles (#42187)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-02 14:02:01 -07:00
Yifan Qiao e9e08c49b9 [Bugfix] Cache the EAGLE/MTP lookahead block in the SWA prefix-cache mask (#44082)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-06-02 12:21:07 -07:00
Nick Hill da107a59e5 [MRV2] Also enable MRV2 for Llama and Mistral dense models (#43458)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
2026-06-02 11:18:46 -07:00
Chauncey ed9a7526b6 [Anthropic] Support system role messages inside messages array (#44283)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
2026-06-02 18:13:54 +00:00
Flora Feng 478b49ddec [Refactor] Remove dead code from parser infrastructure (#44279)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 12:08:27 -04:00
Nick Hill cab5c9a2a9 [Core] Move max_concurrent_batches to VllmConfig (#44274)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-02 08:57:25 -07:00
XiaoZ 53fa09d085 [Misc] Support local image encoding in benchmarks (#43843)
Signed-off-by: xiaoz <Sukra1@outlook.com>
2026-06-02 15:15:06 +00:00
王金旭 0bdfd5eb84 [Bugfix] Vendor MiniCPMV/MiniCPMO processors to unblock Transformers v5 (#44282)
Signed-off-by: guanwei-wu <b08901019@ntu.edu.tw>
Signed-off-by: wjinxu <1299461899@qq.com>
Co-authored-by: guanwei-wu <b08901019@ntu.edu.tw>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-02 07:14:38 -07:00
gruner 654bd2bca4 [Bugfix] Sync block_size from EngineCore to frontend for hybrid Mamba… (#42967)
Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-06-02 13:41:00 +00:00
wang.yuqi b623f7ea95 [Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-02 06:30:21 -07:00
Shreyas Kulkarni 0eeba5eec1 Fix DFlash prefix cache corruption due to missing lookahead block (#42971)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
2026-06-02 12:06:33 +00:00
Ronen Schaffer 2a2b5ca791 [KV Offload] Add on_schedule_end() hook to separate step lifecycle from event draining (#44206)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
2026-06-02 13:42:52 +03:00
Isotr0py f8e9c56d15 [Multimodal] Automatically select registered video loader for VLM (#44126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-02 09:09:47 +00:00
alberto e30313220c [Parser] Migrate ResponsesParser to unified Parser interface (#42977)
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
2026-06-02 08:50:05 +00:00
omerpaz95 d247a9dc13 [EC Connector] Non blocking EC Connector lookup (#41627)
Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-06-02 08:48:25 +00:00
Yifan Qiao 7c37096620 [Core][Refactor]: thread scheduler_block_size into KVCacheManager and KVCacheCoordinator (#44165)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-06-02 01:14:44 -07:00
Fadi Arafeh 0b25cf4419 [CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-06-02 08:00:48 +00:00
Flora Feng 68dafcca75 [Refactor] Unify reasoning + tool-call parsing behind Parser.parse() (#44267)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 15:11:42 +08:00
JooHo Lee a045c7425f [MM][CG] Profile encoder CUDA graph pool memory (#41714)
Signed-off-by: JooHo Lee <jooho414@gmail.com>
2026-06-02 12:27:34 +08:00
Or Ozeri 480fadab1b [BugFix][kv_offload]: Prevent offloading stale sliding window blocks (#42959)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-06-02 05:59:48 +03:00
Andreas Karatzas 54d0c36fff [CI] Stabilize OpenAI schema fuzzing for malformed structural tags (#44131)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-01 19:56:15 -07:00
Flora Feng 9affc17a05 [Refactor] Move unstreamed tool-arg flush from serving layer to parser (#44017)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 10:37:43 +08:00
Dao007forever d68f0b220e [Bugfix][Mooncake] Release GPU pin on failed store in MooncakeStoreConnector (#43742)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-06-01 18:29:18 -07:00
JartX 48c0d13e65 [ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-01 19:09:01 -05:00
Nick Hill e4cbc4385d [Test][BugFix] Fix double-BOS in PD+specdec acceptance test (#44234)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-01 14:31:12 -07:00
Nick Hill 6f8b40a23f [BugFix][CI] Fix added _has_module tests (#44248)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-01 14:23:12 -07:00
Siddharth Bedekar 266b9d9c64 [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-01 15:37:30 -04:00
Andreas Karatzas fd9e91d7e4 [ROCm][CI] Fix and stabilize EAGLE3 acceptance tests (#41294)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2026-06-01 12:40:01 -05:00
Madeesh Kannan 023808c23d [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-06-01 10:11:35 -04:00
Chaojun Zhang bd0aecdc08 [XPU][CI] Fix test_audio_in_video flake by using module-scoped server fixture (#44146)
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
2026-06-01 11:21:36 +00:00
wang.yuqi 0910f7e0e1 [Frontend] Resettle generative scoring entrypoint. (#44153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-01 07:54:59 +00:00
Jeffrey Wang 29d69332aa [BugFix] Fix _has_module to verify native deps via trial import (#44035)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-05-31 22:06:33 -07:00
Umut Polat f46e6be169 [Misc] Use VLLMValidationError consistently in chat completion and completion protocol validators (#36254)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-06-01 04:04:11 +00:00
Jee Jee Li 6bdabbad5b [CI/Build] Enable Step3p7ForConditionalGeneration testing (#43956)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-31 05:16:12 +00:00
Liangliang Ma e9499996df [BugFix][Platform] Fix import vllm.platforms.rocm error on non-CUDA test_gpt_oss.py (#43571)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-29 23:16:49 -07:00
Andreas Karatzas ef8840adc7 [ROCm][CI] Fix failure in the Phi3V pooling test (#44028)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-30 12:14:37 +08:00
Jee Jee Li 559d6710bf [PERF]MiniMax-M2 gate kernel (#38445)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>
2026-05-29 18:28:34 -07:00
Flora Feng 8c6daf6e2f [CI] Remove duplicate Harmony test coverage (#44023)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 22:52:46 +00:00
bnellnm 7b98f498cd [MoE Refactor] Remove supports_expert_map (#43108)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-05-29 17:26:56 -04:00
Wentao Ye 5dbf1605a0 [Feature] SSL support for dp supervisor (#43688)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-29 19:28:12 +00:00
Flora Feng 6de08e8b46 [CI] Remove redundant test_chat_with_tool_reasoning.py (#44011)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 19:23:56 +00:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
Taneem Ibrahim 5502c3b52d [Misc] added unit tests for the core pooling methods (#43818)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-05-29 14:40:31 +00:00
Lucain 11dfa3169d Add vLLM library info to Hugging Face Hub requests (#43857)
Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 14:04:58 +00:00