Mohammad Miadh Angkad
ad7125a431
[Bugfix] Fix DeepSeek V4 MTP HC state handling ( #42320 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
(cherry picked from commit f1cc7aad3c )
v0.21.0
2026-05-14 21:28:34 -07:00
Yongye Zhu
9da56fd18b
[Bugfix] Add swiglu limits to deepgemm fp8 methods ( #41986 )
...
Cherry-picked from https://github.com/vllm-project/vllm/pull/41986
Plumb SwiGLU clamp limit through DeepGemm FP8/W4A8 MoE quant configs
and experts. Extend silu_mul_per_token_group_quant_fp8_colmajor with
clamp support and forward the limit on all FP8/MXFP8/MXFP4 paths.
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
Signed-off-by: khluu <khluu000@gmail.com >
2026-05-14 12:38:36 -07:00
Yongye Zhu
800604bf53
[MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 prefill + decode on Blackwell ( #41778 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
(cherry picked from commit 0d2732dd91 )
v0.21.0rc3
2026-05-14 00:59:51 -07:00
khluu
75a7914326
pin cutlass-dsl to 4.4.2
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-05-14 00:59:01 -07:00
ovidiusm
3b581add43
[PD] Fix broken NIXL EP installation ( #42542 )
...
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com >
(cherry picked from commit cca32d55a2 )
2026-05-13 15:15:07 -07:00
Kevin H. Luu
342cec8812
[CI] Use uv with Python 3.12 for PyPI wheel upload ( #42470 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
(cherry picked from commit f6e868fbdf )
2026-05-13 02:12:34 -07:00
Jiangyun Zhu
135453b715
[Bugfix] Install nvidia-cutlass-dsl[cu13] extra on CUDA 13 platforms ( #42438 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
(cherry picked from commit 140dc2ec30 )
v0.21.0rc2
2026-05-13 02:03:17 -07:00
sychen52
a707288c1e
Patch SlidingWindowSpec.real_page_size_bytes for nvfp4 kv ( #42464 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com >
(cherry picked from commit a8c13d2837 )
2026-05-13 02:03:07 -07:00
Alec
638f8fa979
[PD] Bump NIXL connector dependency to 1.x ( #42364 )
...
Signed-off-by: Alec Flowers <aflowers@nvidia.com >
(cherry picked from commit 07534b8782 )
2026-05-13 02:02:55 -07:00
Chao Lei
cbaa80fede
[KV Transfer] Add MooncakeStoreConnector for KV cache offloading via Mooncake distributed store ( #40900 )
...
Signed-off-by: leichao.lc <leichao.lc@antgroup.com >
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai >
Co-authored-by: leichao.lc <leichao.lc@antgroup.com >
Co-authored-by: ivanium <yifanqiao@inferact.ai >
Co-authored-by: aoshen524 <aoshen@inferact.ai >
Co-authored-by: Dao007forever <daole@inferact.ai >
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com >
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com >
Co-authored-by: foraxe <1055696449@qq.com >
Co-authored-by: Skywalker-EP <173423846@qq.com >
Co-authored-by: fems14 <1804143737@qq.com >
Co-authored-by: jianzs <zheng.shoujian@outlook.com >
Co-authored-by: baxingpiaochong <771405853@qq.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit ebeb09d822 )
2026-05-13 02:02:44 -07:00
Kevin H. Luu
84a1066ccc
[CI] Inline build artifact annotations in release pipeline ( #42357 )
...
Signed-off-by: khluu <khluu000@gmail.com >
(cherry picked from commit 8c4fc4202a )
2026-05-13 02:02:30 -07:00
Michael Goin
d801ae8c26
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython ( #41516 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
(cherry picked from commit d077622d60 )
v0.21.0rc1
2026-05-12 14:57:17 -07:00
Jiahan Chang (Cyrus)
65df49eba3
[Perf] Use 2D-grid to eliminate divmod in W8W8 group quant ( #42153 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
(cherry picked from commit dd6b3a5ef5 )
2026-05-12 14:57:06 -07:00
Kevin H. Luu
2a2ac21d3d
[CI] Move DockerHub and PyPI publish steps to end of release pipeline ( #42355 )
...
Signed-off-by: khluu <khluu000@gmail.com >
(cherry picked from commit e1c8776e90 )
2026-05-12 14:56:46 -07:00
Jee Jee Li
c6fc95806b
[Bugfix] Fix DSV4 swiglu_limit on marlin backend ( #42287 )
...
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
(cherry picked from commit 53181384e0 )
2026-05-12 14:56:29 -07:00
汪志鹏
581b5e9afc
[Frontend] Return rendered prompt text in chat completion response ( #42052 )
...
Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com >
Co-authored-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com >
Co-authored-by: Cursor <cursor@cursor.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-05-11 13:53:39 +08:00
wangxiyuan
5536fc0c01
[Misc] Replace mamba_type string literals with MambaAttentionBackendEnum ( #41188 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-05-11 03:59:36 +00:00
vllmellm
7f95e66a11
[ROCm][Bugfix]: dynamically align BLOCK_DMODEL with Lv in MLA decode kernel ( #41119 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-05-11 11:14:19 +08:00
yzong-rh
b1687527b8
[Bugfix] Gemma 4 chat template crash with missing tool name and tool id ( #42188 )
...
Signed-off-by: Yifan <yzong@redhat.com >
2026-05-11 03:07:45 +00:00
gnovack
171019ab19
add fused mhc_post_pre kernel ( #41536 )
...
Signed-off-by: george <george@inferact.ai >
Co-authored-by: george <george@inferact.ai >
2026-05-10 19:56:52 -07:00
Haoqing Wang
879a8c3180
Fix Molmo2 image token metadata ( #42162 )
...
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com >
2026-05-11 01:19:21 +00:00
bnellnm
1b57eb41f2
[MoE] Move various experts classes to fused_moe/experts/ ( #41979 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Jackmin801 <ongjackm@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com >
2026-05-11 07:54:33 +08:00
Mohammad Miadh Angkad
21943d4c25
[Performance] Make safetensors checkpoint prefetch settings configurable ( #41499 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2026-05-10 15:55:15 +00:00
Isotr0py
f396bee56f
[DSV4] Add PP support for deepseek-v4 ( #41694 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com >
2026-05-10 15:47:26 +00:00
Vensen
215e2f7990
[Bugfix][Mamba] IMA in causal_conv1d kernel for long sequences ( #41617 )
...
Signed-off-by: vensen <vensenmu@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 12:38:28 +00:00
Ronen Schaffer
e175192d33
[KV Offload] Pass ReqContext to touch(), complete_load(), and complete_store() ( #41366 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-05-10 15:09:25 +03:00
Jonathan Mamou
a54f0d1049
[CPU] Fix spec decode kernel signatures for synthetic mode compatibility ( #41932 )
...
Signed-off-by: jmamou <jonathan.mamou@intel.com >
Signed-off-by: Jonathan Mamou <jonathan.mamou@intel.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2026-05-10 12:07:15 +00:00
Isotr0py
48698b1b9b
[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled ( #37912 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <Isotr0py@outlook.com >
2026-05-10 10:59:02 +00:00
Andreas Karatzas
0a309b5ee9
[ROCm] Cap Triton paged attention block size to fix ROCm shared memory OOM ( #38502 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-05-10 10:03:00 +00:00
Jee Jee Li
84f7a55340
[CI] Trigger LoRA test when changing MoE code. ( #42196 )
...
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai >
2026-05-10 01:26:09 -07:00
Ethan Feng
a2c9d548d7
[Docs] Fix broken local links ( #42160 )
...
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com >
2026-05-10 01:15:38 -07:00
Yongye Zhu
301305c093
Add @zyongye to CODEOWNERS ( #42200 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2026-05-10 16:07:32 +08:00
Mohammad Miadh Angkad
efd0e7789d
Fix mypy failure on main ( #42197 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2026-05-10 07:55:57 +00:00
abdulrahman-cohere
a5d0a5afba
[Frontend][Bugfix] Abort ASR engine requests on cancellation ( #41266 )
...
Signed-off-by: abdulrahman-cohere <abdulrahman.abdulrazzag@cohere.com >
Signed-off-by: <>
Co-authored-by: Cursor Agent <cursor-agent@cursor.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-05-09 23:51:11 -07:00
Andreas Karatzas
f2840120f6
[ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics ( #41313 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-05-10 14:50:16 +08:00
Dao007forever
3f5bd482f5
[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks ( #41269 )
2026-05-09 22:52:09 -07:00
Andreas Karatzas
fb1ac806c5
[ROCm][CI] Stabilize ROCm shutdown and distributed compile CI ( #41573 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-05-10 03:47:40 +00:00
Wei Zhao
986edc858a
[Bugfix] Fix DeepSeek v4 topk numerical issue for unaligned max-model-len ( #42169 )
2026-05-09 20:30:08 -07:00
Abhishek Gupta
27d3bac272
docs: clarify Gemma 4 assistant speculative decoding ( #42180 )
...
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com >
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com >
2026-05-09 20:08:44 -07:00
Itay Etelis
00b0618a03
Use CU_MEMCPY_SRC_ACCESS_ORDER_ANY for batch KV cache swaps ( #39306 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Signed-off-by: Itay Etelis <etelis2019@gmail.com >
Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Itay Etelis <etelis2019@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 05:57:09 +03:00
Christian Van
0d382ecde8
Handle optional bool-or-string CLI args in get_kwargs ( #40951 )
...
Signed-off-by: Christian Van <cvan20191@gmail.com >
Co-authored-by: Christian Van <cvan20191@gmail.com >
2026-05-09 19:47:21 -07:00
Isotr0py
1029e5ef28
[CI/Build] Use modelscope's international site for regression test ( #42176 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-05-09 19:47:09 -07:00
Wang Xingran
0b272a6e01
[Bugfix] Fix SP pass for multimodal models and PP+SP residual handling ( #33322 )
...
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com >
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com >
2026-05-09 19:44:16 -07:00
Nave Assaf
dcb3135af7
Fix: Nemotron 3 rescue whitespace-only final_content, not just None ( #41846 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 02:07:58 +00:00
baonudesifeizhai
bc5fdc1e6a
Add NVFP4 all-gather GEMM fusion for AsyncTP ( #41882 )
...
Signed-off-by: roG0d <baonudesifeizhai@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-10 01:13:22 +00:00
aoshen02
006af4b956
[Bugfix] Skip routed-experts hot path when disabled ( #42148 )
2026-05-09 18:01:04 -07:00
Wentao Ye
ea0e501bb1
[KV Connector] Remove compat support for pre-v0.12.0 constructor signatures without KVCacheConfig ( #39832 )
...
The v0.12.0 release contained initial support for HMA in KV Connectors. As part
of these changes, a KVCacheConfig argument was added to KV connector
constructors. Backwards compatibility support for out-of-tree connectors was
included in this change, with a very prominent warning. See #25712 and #27887 .
Since the warning has been around for over 5 months, we can safely remove
the support of it.
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-05-09 23:39:46 +00:00
Wentao Ye
f80aa53c9d
[Refactor] Nixl util using lazy init ( #41392 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-05-09 17:46:52 -04:00
Juhi Mittal
7a2b596982
[Quantization] Add ModelOpt NVFP4 W4A16 (4-bit weights, fp16/bf16 activations) support ( #41769 )
...
Signed-off-by: Juhi Mittal <juhim@nvidia.com >
2026-05-09 21:15:50 +00:00
Jiangyun Zhu
2ee8c2a56e
[SpecDecoding] extend mtp support for mimo 2.5 ( #41905 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-05-09 18:22:59 +00:00