Commit Graph

16499 Commits

Author SHA1 Message Date
Mohammad Miadh Angkad ad7125a431 [Bugfix] Fix DeepSeek V4 MTP HC state handling (#42320)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
(cherry picked from commit f1cc7aad3c)
v0.21.0
2026-05-14 21:28:34 -07:00
Yongye Zhu 9da56fd18b [Bugfix] Add swiglu limits to deepgemm fp8 methods (#41986)
Cherry-picked from https://github.com/vllm-project/vllm/pull/41986

Plumb SwiGLU clamp limit through DeepGemm FP8/W4A8 MoE quant configs
and experts. Extend silu_mul_per_token_group_quant_fp8_colmajor with
clamp support and forward the limit on all FP8/MXFP8/MXFP4 paths.

Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
2026-05-14 12:38:36 -07:00
Yongye Zhu 800604bf53 [MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 prefill + decode on Blackwell (#41778)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
(cherry picked from commit 0d2732dd91)
v0.21.0rc3
2026-05-14 00:59:51 -07:00
khluu 75a7914326 pin cutlass-dsl to 4.4.2
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-14 00:59:01 -07:00
ovidiusm 3b581add43 [PD] Fix broken NIXL EP installation (#42542)
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
(cherry picked from commit cca32d55a2)
2026-05-13 15:15:07 -07:00
Kevin H. Luu 342cec8812 [CI] Use uv with Python 3.12 for PyPI wheel upload (#42470)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit f6e868fbdf)
2026-05-13 02:12:34 -07:00
Jiangyun Zhu 135453b715 [Bugfix] Install nvidia-cutlass-dsl[cu13] extra on CUDA 13 platforms (#42438)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
(cherry picked from commit 140dc2ec30)
v0.21.0rc2
2026-05-13 02:03:17 -07:00
sychen52 a707288c1e Patch SlidingWindowSpec.real_page_size_bytes for nvfp4 kv (#42464)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
(cherry picked from commit a8c13d2837)
2026-05-13 02:03:07 -07:00
Alec 638f8fa979 [PD] Bump NIXL connector dependency to 1.x (#42364)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
(cherry picked from commit 07534b8782)
2026-05-13 02:02:55 -07:00
Chao Lei cbaa80fede [KV Transfer] Add MooncakeStoreConnector for KV cache offloading via Mooncake distributed store (#40900)
Signed-off-by: leichao.lc <leichao.lc@antgroup.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: leichao.lc <leichao.lc@antgroup.com>
Co-authored-by: ivanium <yifanqiao@inferact.ai>
Co-authored-by: aoshen524 <aoshen@inferact.ai>
Co-authored-by: Dao007forever <daole@inferact.ai>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: foraxe <1055696449@qq.com>
Co-authored-by: Skywalker-EP <173423846@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit ebeb09d822)
2026-05-13 02:02:44 -07:00
Kevin H. Luu 84a1066ccc [CI] Inline build artifact annotations in release pipeline (#42357)
Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit 8c4fc4202a)
2026-05-13 02:02:30 -07:00
Michael Goin d801ae8c26 [Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython (#41516)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit d077622d60)
v0.21.0rc1
2026-05-12 14:57:17 -07:00
Jiahan Chang (Cyrus) 65df49eba3 [Perf] Use 2D-grid to eliminate divmod in W8W8 group quant (#42153)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit dd6b3a5ef5)
2026-05-12 14:57:06 -07:00
Kevin H. Luu 2a2ac21d3d [CI] Move DockerHub and PyPI publish steps to end of release pipeline (#42355)
Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit e1c8776e90)
2026-05-12 14:56:46 -07:00
Jee Jee Li c6fc95806b [Bugfix] Fix DSV4 swiglu_limit on marlin backend (#42287)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit 53181384e0)
2026-05-12 14:56:29 -07:00
汪志鹏 581b5e9afc [Frontend] Return rendered prompt text in chat completion response (#42052)
Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>
Co-authored-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>
Co-authored-by: Cursor <cursor@cursor.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-05-11 13:53:39 +08:00
wangxiyuan 5536fc0c01 [Misc] Replace mamba_type string literals with MambaAttentionBackendEnum (#41188)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-05-11 03:59:36 +00:00
vllmellm 7f95e66a11 [ROCm][Bugfix]: dynamically align BLOCK_DMODEL with Lv in MLA decode kernel (#41119)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-05-11 11:14:19 +08:00
yzong-rh b1687527b8 [Bugfix] Gemma 4 chat template crash with missing tool name and tool id (#42188)
Signed-off-by: Yifan <yzong@redhat.com>
2026-05-11 03:07:45 +00:00
gnovack 171019ab19 add fused mhc_post_pre kernel (#41536)
Signed-off-by: george <george@inferact.ai>
Co-authored-by: george <george@inferact.ai>
2026-05-10 19:56:52 -07:00
Haoqing Wang 879a8c3180 Fix Molmo2 image token metadata (#42162)
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>
2026-05-11 01:19:21 +00:00
bnellnm 1b57eb41f2 [MoE] Move various experts classes to fused_moe/experts/ (#41979)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Jackmin801 <ongjackm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
2026-05-11 07:54:33 +08:00
Mohammad Miadh Angkad 21943d4c25 [Performance] Make safetensors checkpoint prefetch settings configurable (#41499)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2026-05-10 15:55:15 +00:00
Isotr0py f396bee56f [DSV4] Add PP support for deepseek-v4 (#41694)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
2026-05-10 15:47:26 +00:00
Vensen 215e2f7990 [Bugfix][Mamba] IMA in causal_conv1d kernel for long sequences (#41617)
Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 12:38:28 +00:00
Ronen Schaffer e175192d33 [KV Offload] Pass ReqContext to touch(), complete_load(), and complete_store() (#41366)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
2026-05-10 15:09:25 +03:00
Jonathan Mamou a54f0d1049 [CPU] Fix spec decode kernel signatures for synthetic mode compatibility (#41932)
Signed-off-by: jmamou <jonathan.mamou@intel.com>
Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2026-05-10 12:07:15 +00:00
Isotr0py 48698b1b9b [Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled (#37912)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
2026-05-10 10:59:02 +00:00
Andreas Karatzas 0a309b5ee9 [ROCm] Cap Triton paged attention block size to fix ROCm shared memory OOM (#38502)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 10:03:00 +00:00
Jee Jee Li 84f7a55340 [CI] Trigger LoRA test when changing MoE code. (#42196)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-10 01:26:09 -07:00
Ethan Feng a2c9d548d7 [Docs] Fix broken local links (#42160)
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
2026-05-10 01:15:38 -07:00
Yongye Zhu 301305c093 Add @zyongye to CODEOWNERS (#42200)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2026-05-10 16:07:32 +08:00
Mohammad Miadh Angkad efd0e7789d Fix mypy failure on main (#42197)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2026-05-10 07:55:57 +00:00
abdulrahman-cohere a5d0a5afba [Frontend][Bugfix] Abort ASR engine requests on cancellation (#41266)
Signed-off-by: abdulrahman-cohere <abdulrahman.abdulrazzag@cohere.com>
Signed-off-by: <>
Co-authored-by: Cursor Agent <cursor-agent@cursor.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-05-09 23:51:11 -07:00
Andreas Karatzas f2840120f6 [ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (#41313)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 14:50:16 +08:00
Dao007forever 3f5bd482f5 [Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks (#41269) 2026-05-09 22:52:09 -07:00
Andreas Karatzas fb1ac806c5 [ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (#41573)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 03:47:40 +00:00
Wei Zhao 986edc858a [Bugfix] Fix DeepSeek v4 topk numerical issue for unaligned max-model-len (#42169) 2026-05-09 20:30:08 -07:00
Abhishek Gupta 27d3bac272 docs: clarify Gemma 4 assistant speculative decoding (#42180)
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
2026-05-09 20:08:44 -07:00
Itay Etelis 00b0618a03 Use CU_MEMCPY_SRC_ACCESS_ORDER_ANY for batch KV cache swaps (#39306)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: Itay Etelis <etelis2019@gmail.com>
Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Itay Etelis <etelis2019@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 05:57:09 +03:00
Christian Van 0d382ecde8 Handle optional bool-or-string CLI args in get_kwargs (#40951)
Signed-off-by: Christian Van <cvan20191@gmail.com>
Co-authored-by: Christian Van <cvan20191@gmail.com>
2026-05-09 19:47:21 -07:00
Isotr0py 1029e5ef28 [CI/Build] Use modelscope's international site for regression test (#42176)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-05-09 19:47:09 -07:00
Wang Xingran 0b272a6e01 [Bugfix] Fix SP pass for multimodal models and PP+SP residual handling (#33322)
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com>
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
2026-05-09 19:44:16 -07:00
Nave Assaf dcb3135af7 Fix: Nemotron 3 rescue whitespace-only final_content, not just None (#41846)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 02:07:58 +00:00
baonudesifeizhai bc5fdc1e6a Add NVFP4 all-gather GEMM fusion for AsyncTP (#41882)
Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 01:13:22 +00:00
aoshen02 006af4b956 [Bugfix] Skip routed-experts hot path when disabled (#42148) 2026-05-09 18:01:04 -07:00
Wentao Ye ea0e501bb1 [KV Connector] Remove compat support for pre-v0.12.0 constructor signatures without KVCacheConfig (#39832)
The v0.12.0 release contained initial support for HMA in KV Connectors. As part
of these changes, a KVCacheConfig argument was added to KV connector
constructors. Backwards compatibility support for out-of-tree connectors was
included in this change, with a very prominent warning. See #25712 and #27887.

Since the warning has been around for over 5 months, we can safely remove
the support of it.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-09 23:39:46 +00:00
Wentao Ye f80aa53c9d [Refactor] Nixl util using lazy init (#41392)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-09 17:46:52 -04:00
Juhi Mittal 7a2b596982 [Quantization] Add ModelOpt NVFP4 W4A16 (4-bit weights, fp16/bf16 activations) support (#41769)
Signed-off-by: Juhi Mittal <juhim@nvidia.com>
2026-05-09 21:15:50 +00:00
Jiangyun Zhu 2ee8c2a56e [SpecDecoding] extend mtp support for mimo 2.5 (#41905)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-05-09 18:22:59 +00:00