17325 Commits

Author SHA1 Message Date
Isotr0py 1fd8bd02a4 [Docs] Replace broken video url in examples (#44159)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-01 06:01:10 +00:00
Jeffrey Wang 29d69332aa [BugFix] Fix _has_module to verify native deps via trial import (#44035)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-05-31 22:06:33 -07:00
Lucas Wilkinson 4721bb3aa4 [MRV2] Remove Eagle's dedicated CUDA graph pool (#44078)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-05-31 22:00:33 -07:00
Umut Polat f46e6be169 [Misc] Use VLLMValidationError consistently in chat completion and completion protocol validators (#36254)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-06-01 04:04:11 +00:00
nightcityblade 8b8546da1c docs: fix MLA attention docstring examples (#44118)
Co-authored-by: nightcityblade <nightcityblade@gmail.com>
2026-05-31 12:28:38 -07:00
Jee Jee Li 6bdabbad5b [CI/Build] Enable Step3p7ForConditionalGeneration testing (#43956)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-31 05:16:12 +00:00
Aakar Dwivedi 3fd9d2d357 [CPU][Zen] Route W8A8 and W4A16 linear inference through zentorch on AMD Zen CPUs (#41813)
Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-05-30 14:17:21 -05:00
Woosuk Kwon 27fa5aa3b9 [MRV2] Support breakable CUDA graph (#44050)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-05-30 09:40:52 -07:00
Wentao Ye e1105064b2 [Bug] Fix gemma4 MTP IMA issue when TP>1, CUDA error: an illegal memory access was encountered (#43909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-30 10:34:33 -04:00
Bugen Zhao 50c80d7923 [Governance] Add @BugenZhao as Rust frontend code owner (#44047)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
2026-05-30 22:23:54 +08:00
Xiaoran 3becc5db40 [ROCm] Add attention sink support to AITer flash attention backend (#43817)
Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>
2026-05-30 18:13:18 +08:00
Lanze Liu 124fac10cb [Bugfix] Fix RMSNorm kernels to multiply in weight's native dtype (#42379)
Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-29 23:16:53 -07:00
Liangliang Ma e9499996df [BugFix][Platform] Fix import vllm.platforms.rocm error on non-CUDA test_gpt_oss.py (#43571)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-29 23:16:49 -07:00
nemanjaudovic c0056b19bf [ROCm] cmake: support PYTORCH_FOUND_HIP for torch 2.13 native HIP language support (#43881)
Signed-off-by: nemanjaudovic <nudovic@amd.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
2026-05-29 22:16:57 -07:00
Andreas Karatzas ef8840adc7 [ROCm][CI] Fix failure in the Phi3V pooling test (#44028)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-30 12:14:37 +08:00
Flora Feng 1a096d8208 [Refactor] Remove dead current_tool_name_sent assignments from tool parsers (#43997)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 21:45:15 -04:00
Gagan Dhakrey 1e2ce5d11a offload prompt_embeds decode in render_prompts_async to avoid blocking (#43792)
Signed-off-by: Gagan Dhakrey <gagandhakrey@gmail.com>
2026-05-30 01:36:34 +00:00
Jee Jee Li 559d6710bf [PERF]MiniMax-M2 gate kernel (#38445)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>
2026-05-29 18:28:34 -07:00
bnellnm 187457a952 Revert "[MoE Refactor] Migrate MoeWNA16Method quantization to MK orac… (#44033)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-05-29 16:45:29 -07:00
Kevin H. Luu 8fad266507 [CI] Fix smoke test step key to bypass block gate (#43974)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-29 16:28:32 -07:00
Flora Feng 8c6daf6e2f [CI] Remove duplicate Harmony test coverage (#44023)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 22:52:46 +00:00
bnellnm 7b98f498cd [MoE Refactor] Remove supports_expert_map (#43108)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-05-29 17:26:56 -04:00
bnellnm 106aa92f04 [MoE Refactor] Migrate MoeWNA16Method quantization to MK oracle (#42647)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-29 17:19:31 -04:00
yzong-rh 46409fd2a1 [Fronten] Clean up stop_token_ids override for Harmony (#44009)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-05-29 13:28:06 -07:00
Tyler Michael Smith 38b864d81d [Metrics] Exclude KV transfer tokens from iteration_tokens_total (#43346)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-29 19:56:44 +00:00
Wentao Ye 5dbf1605a0 [Feature] SSL support for dp supervisor (#43688)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-29 19:28:12 +00:00
Kevin H. Luu acbc203340 Add @khluu to CODEOWNERS (#44019)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-05-29 12:24:29 -07:00
Flora Feng 6de08e8b46 [CI] Remove redundant test_chat_with_tool_reasoning.py (#44011)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 19:23:56 +00:00
Kevin H. Luu 6aabe221a5 [CI] Make Model Executor test hangs fail fast with a traceback (#43971)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
v0.22.1rc0
2026-05-29 11:58:25 -07:00
Wentao Ye 739096a028 [Bug] Fix torch device issue for MOE permute (#44005)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-29 18:55:00 +00:00
czhu-cohere 8b9deeec4b [Bugfix] Fix Ray placement group allocation with grouped nodes (#43998)
Signed-off-by: <conway.zhu@cohere.com>
Signed-off-by: root <conway.zhu@cohere.com>
2026-05-29 12:51:05 -06:00
qizixi d07ad0693b [Bugfix] Use storage_block_size in KV cache reshape for compressed specs (DeepSeek V4) (#43988)
Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:14:25 -07:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
bnellnm 84b2a8a7e7 [MoE Refactor] WNA16 MoE backend selection into oracle module (#42553)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-29 13:11:17 -04:00
qizixi 4ff865c38e [Bugfix] Disable allreduce_rms_fusion when pipeline_parallel_size > 1 (#43616)
Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-29 22:57:43 +08:00
Taneem Ibrahim 5502c3b52d [Misc] added unit tests for the core pooling methods (#43818)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-05-29 14:40:31 +00:00
Chunyang Wen f191d5630e docs: clarify ITL acronym in optimization docs (#43922)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
2026-05-29 07:40:05 -07:00
Lucain 11dfa3169d Add vLLM library info to Hugging Face Hub requests (#43857)
Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 14:04:58 +00:00
Li, Jiang 3f6f508e14 [Bugfix][CPU] Remove invalid extra deps (#43977)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-29 22:02:09 +08:00
Harry Mellor 0585b5ba2e Skip docs build if PR doesn't affect docs (#43972)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-29 12:09:52 +00:00
Thien Tran d2889722ff [Bugfix] Corrupted MLA + linear attention (#43961)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2026-05-29 05:00:51 -07:00
frida-andersson 0b56815a24 [ROCm][Perf] DSv3.2 MI355X TP4 decode-step orchestration cleanup (3 micro-opts) (#42982)
Signed-off-by: Frida Andersson <fanderss@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 04:26:57 -07:00
MHYangAMD ab12aab127 [Bugfix] [ROCm] [DSV4] Fix AITER MXFP4 MoE weight loading and shuffle… (#42595)
Co-authored-by: MHYangAMD <MHYangAMD@users.noreply.github.com>
2026-05-29 04:08:33 -07:00
JartX 0cff0741ff [Kernel][ROCm] Native W4A16 kernel for AMD RDNA3 (gfx1100) — fp16 + bf16 (#41394)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-05-29 11:04:40 +00:00
Joaquín Mondéjar 60a7a2214f [Bugfix] Fix Step3 pipeline parallel KeyError for residual tensor (#37622)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-05-29 03:04:02 -07:00
Nicolò Lucchesi 7ebc0ec104 [CI] Nixl+SimpleCPUOffloadingConnector unit tests (#43871)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-29 02:40:42 -07:00
Qiming Zhang e8b5199973 [XPU] support MTP of gdn attention (#43565)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-29 17:10:24 +08:00
Simon Danielsson b7fb747d8d [CI][ROCm] Don't skip MoRI-IO Connector tests (#43703)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
2026-05-29 17:06:23 +08:00
Kunshang Ji 30c6289b8e [XPU] fix xpu install document triton-xpu version (#43947)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-29 02:05:12 -07:00
Andreas Karatzas ff990d0d32 [ROCm][CI] Fix AITER unified attention for encoder-decoder cross-attention (#43945)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-29 16:43:39 +08:00