Commit Graph

16958 Commits

Author SHA1 Message Date
khluu 0b3ba88f16 Revert "[CPU] Experimentally enable Triton and MRV2 (#43225)"
This reverts commit 65b7a812a2.
v0.22.0
2026-05-29 02:28:43 -07:00
Vadim Gimpelson 799c3afa5d [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
v0.22.0rc3
2026-05-28 00:11:54 -07:00
Thien Tran 64e25235c7 [Bugfix] Pass routed_scaling_factor to FlashInfer TRTLLM BF16 MoE (#43769) 2026-05-28 00:11:49 -07:00
TJian a147dd0115 [ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-05-28 00:11:43 -07:00
amitz-nv 0759293512 [Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
2026-05-28 00:11:38 -07:00
Benjamin Bartels a930f5a58d Fix RunAI streamer tensor buffer reuse during weight loading (#43464)
Signed-off-by: bbartels <benjamin@bartels.dev>
2026-05-28 00:11:32 -07:00
Harry Mellor 40cf0206ba Fix early CUDA init (#43791)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit 41688e2dc7)
v0.22.0rc2
2026-05-27 14:20:37 -07:00
Yongye Zhu 8c4061336a [misc] Bump cutedsl version to 4.5.2 (#43745)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit 03d9cc2fe2)
2026-05-27 14:20:37 -07:00
Ashwin Giridharan 5ebdf473c5 [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
(cherry picked from commit 52a31ccecc)
2026-05-27 14:20:37 -07:00
Nick Hill a94cd6d98f [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
(cherry picked from commit 8c94938cfb)
v0.22.0rc1
2026-05-27 00:37:22 -07:00
Woosuk Kwon edfb45bbd0 [DSv4] Refactor compressor & Fix ROCm compatibility (#43710)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit adaa5e455a)
2026-05-27 00:37:15 -07:00
Vadim Gimpelson 4eeee85f9b [Bugfix][V1] Fix TOCTOU race causing intermittent EADDRINUSE on multi-API-server DP startup (#42585)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 812e7e7364)
2026-05-27 00:37:08 -07:00
Woosuk Kwon c0a485e032 [DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit 193ce8812e)
2026-05-27 00:37:02 -07:00
Woosuk Kwon db1b8f7097 [ROCm] Remove MegaMoE integration in deepseek v4 (#43629)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit c8414a8271)
2026-05-27 00:36:56 -07:00
Yongye Zhu fb83f09e8d [Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)
(cherry picked from commit 6ab6ffb428)
2026-05-27 00:36:50 -07:00
Chaojun Zhang 260b528b1e [XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
(cherry picked from commit 861b97765d)
2026-05-27 00:36:44 -07:00
Jie Fang 78ae17cba1 Add CuTe DSL sparse compressor support (#43584)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit a37e47100c)
2026-05-27 00:36:37 -07:00
Thien Tran b2007c4329 [GDN] GDN Prefill kernel for SM100 (#43273)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
(cherry picked from commit d56612c621)
2026-05-27 00:36:31 -07:00
Mohammad Miadh Angkad b0e9ae808e Fix CuPy runtime deps and restore humming (#43530)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
(cherry picked from commit a970fb5a1a)
2026-05-26 13:12:41 -07:00
Huanyu Yang 6f955986e1 [Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-05-25 22:43:19 -07:00
wang.yuqi d5cf7b4a2c [Frontend] Split the offline inference APIs and utils. (#43553)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-26 05:20:24 +00:00
Yan Ma f815c99954 [Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2026-05-26 13:12:50 +08:00
Dao007forever c2a4005c70 [KV Connector] Propagate MooncakeStore load failures (#42788)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
2026-05-25 22:12:15 -07:00
Dao007forever 7966fc7233 [KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-25 22:11:57 -07:00
Woosuk Kwon aa2b56ffb0 [DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-05-25 21:08:29 -07:00
Jee Jee Li ec5de7fa7d [LoRA] Add one shot triton kernel For MoE LoRA (#42290)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-05-25 19:47:04 -07:00
Chaojun Zhang 71d810bbf4 [XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-26 02:01:30 +00:00
Jee Jee Li d4004455d2 [Kernel] Remove NormGateLinear (#43554)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-25 09:49:19 +00:00
Nicolò Lucchesi 716d5294e6 [Misc] Print accuracy value for PD tests even on success (#43583)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-25 02:10:01 -07:00
Zhewen Li 873758c13a [KV Connector] Handle Mooncake finish after preemption (#43281)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
2026-05-25 01:58:38 -07:00
Yihuki 5c1aec3dc0 Reduce memory usage for granite_speech. (#42933)
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-25 14:12:57 +08:00
Roy Wang 0c942c69d6 [Doc] Add section on escalating stalled contributions (#43568)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-05-25 14:11:01 +08:00
Yifan Qiao 81252d4e24 [Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
2026-05-25 14:04:30 +08:00
Nguyễn Thế Duy 3df1c7c43e [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-25 13:45:31 +08:00
wang.yuqi 1b26fa361e [Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-25 13:44:39 +08:00
weizhoublue 6cbe448eed fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
2026-05-25 12:03:05 +08:00
Jee Jee Li b06813e872 [Kernel] Add mhc_pre_big_fuse_with_norm_tilelang (#43474)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-25 01:19:45 +00:00
Rotem Shavitt d0a100c87a File system secondary tier implemented in python (#41735)
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-05-24 18:14:44 +00:00
danisereb d56285c747 Tuning script and configs for Triton Mamba SSU kernel (#43083)
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
2026-05-24 20:12:44 +03:00
TJian 1806d1adfc [ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-05-24 18:43:08 +08:00
Andreas Karatzas 5940590855 [ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-24 10:06:49 +00:00
Or Ozeri 357fddf614 [kv_offload]: Add DSv4 support (#43142)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-05-24 11:10:12 +03:00
Dao007forever 0902d8e62f [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-23 23:15:03 -07:00
Wentao Ye 33d7cbe02c [Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-23 16:37:24 -07:00
Flora Feng b32fe416ea [Bugfix] Fix reasoning dropped on streaming boundary deltas (#42691)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-23 16:18:30 -07:00
Michael Goin 10d264a2b9 Revert "[Misc] add humming to dependencies" (#43492) 2026-05-23 14:21:13 -07:00
TJian 46f95b2ec2 [ROCm][Critical] Fix the GDN import bug (#43486)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-05-23 21:12:58 +00:00
Dao007forever 819c610f9b [Mooncake] Add metrics for MooncakeStoreConnector operations (#43392) 2026-05-23 13:34:40 -07:00
Siddharth Bedekar 4438b6e7dc [MoE] Migrate W4A8 CT to oracle kernel setup (#42680)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-05-23 13:56:01 -04:00
Holegots 8737e4a857 [Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
2026-05-23 10:42:20 -07:00