khluu
|
0decac0d96
|
fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init
Signed-off-by: khluu <khluu000@gmail.com>
v0.22.1
v0.22.1rc2
|
2026-06-03 17:11:47 -07:00 |
|
Harry Mellor
|
fd56c57bde
|
Fix OlmoHybridForCausalLM not initialising (#43846)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 19af4e6dd4)
|
2026-06-03 16:56:07 -07:00 |
|
Kevin H. Luu
|
7285178622
|
[Bugfix] Fix HyperCLOVAX CI failure after upstream removed remote code (#43860)
Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 61288b5458)
|
2026-06-03 16:55:00 -07:00 |
|
Alec
|
27509c8dde
|
[Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
(cherry picked from commit 816cc73a9b)
|
2026-06-02 23:21:24 -07:00 |
|
Kevin H. Luu
|
b284862ea9
|
[docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
v0.22.1rc1
|
2026-06-02 19:02:03 -07:00 |
|
Madeesh Kannan
|
932dfd5276
|
[Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-06-02 19:01:56 -07:00 |
|
Aakar Dwivedi
|
682ffebfef
|
[CPU][Zen] Route W8A8 and W4A16 linear inference through zentorch on AMD Zen CPUs (#41813)
Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-06-02 19:01:49 -07:00 |
|
Vadim Gimpelson
|
1be7a57a18
|
[Bugfix] Exclude Ray DP from #42585's deferred port allocation (#43864)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-06-02 19:01:42 -07:00 |
|
khluu
|
0b3ba88f16
|
Revert "[CPU] Experimentally enable Triton and MRV2 (#43225)"
This reverts commit 65b7a812a2.
v0.22.0
|
2026-05-29 02:28:43 -07:00 |
|
Vadim Gimpelson
|
799c3afa5d
|
[BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
v0.22.0rc3
|
2026-05-28 00:11:54 -07:00 |
|
Thien Tran
|
64e25235c7
|
[Bugfix] Pass routed_scaling_factor to FlashInfer TRTLLM BF16 MoE (#43769)
|
2026-05-28 00:11:49 -07:00 |
|
TJian
|
a147dd0115
|
[ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-28 00:11:43 -07:00 |
|
amitz-nv
|
0759293512
|
[Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2026-05-28 00:11:38 -07:00 |
|
Benjamin Bartels
|
a930f5a58d
|
Fix RunAI streamer tensor buffer reuse during weight loading (#43464)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2026-05-28 00:11:32 -07:00 |
|
Harry Mellor
|
40cf0206ba
|
Fix early CUDA init (#43791)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit 41688e2dc7)
v0.22.0rc2
|
2026-05-27 14:20:37 -07:00 |
|
Yongye Zhu
|
8c4061336a
|
[misc] Bump cutedsl version to 4.5.2 (#43745)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit 03d9cc2fe2)
|
2026-05-27 14:20:37 -07:00 |
|
Ashwin Giridharan
|
5ebdf473c5
|
[Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
(cherry picked from commit 52a31ccecc)
|
2026-05-27 14:20:37 -07:00 |
|
Nick Hill
|
a94cd6d98f
|
[MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
(cherry picked from commit 8c94938cfb)
v0.22.0rc1
|
2026-05-27 00:37:22 -07:00 |
|
Woosuk Kwon
|
edfb45bbd0
|
[DSv4] Refactor compressor & Fix ROCm compatibility (#43710)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit adaa5e455a)
|
2026-05-27 00:37:15 -07:00 |
|
Vadim Gimpelson
|
4eeee85f9b
|
[Bugfix][V1] Fix TOCTOU race causing intermittent EADDRINUSE on multi-API-server DP startup (#42585)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 812e7e7364)
|
2026-05-27 00:37:08 -07:00 |
|
Woosuk Kwon
|
c0a485e032
|
[DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit 193ce8812e)
|
2026-05-27 00:37:02 -07:00 |
|
Woosuk Kwon
|
db1b8f7097
|
[ROCm] Remove MegaMoE integration in deepseek v4 (#43629)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit c8414a8271)
|
2026-05-27 00:36:56 -07:00 |
|
Yongye Zhu
|
fb83f09e8d
|
[Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)
(cherry picked from commit 6ab6ffb428)
|
2026-05-27 00:36:50 -07:00 |
|
Chaojun Zhang
|
260b528b1e
|
[XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
(cherry picked from commit 861b97765d)
|
2026-05-27 00:36:44 -07:00 |
|
Jie Fang
|
78ae17cba1
|
Add CuTe DSL sparse compressor support (#43584)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit a37e47100c)
|
2026-05-27 00:36:37 -07:00 |
|
Thien Tran
|
b2007c4329
|
[GDN] GDN Prefill kernel for SM100 (#43273)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
(cherry picked from commit d56612c621)
|
2026-05-27 00:36:31 -07:00 |
|
Mohammad Miadh Angkad
|
b0e9ae808e
|
Fix CuPy runtime deps and restore humming (#43530)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
(cherry picked from commit a970fb5a1a)
|
2026-05-26 13:12:41 -07:00 |
|
Huanyu Yang
|
6f955986e1
|
[Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-05-25 22:43:19 -07:00 |
|
wang.yuqi
|
d5cf7b4a2c
|
[Frontend] Split the offline inference APIs and utils. (#43553)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-26 05:20:24 +00:00 |
|
Yan Ma
|
f815c99954
|
[Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-05-26 13:12:50 +08:00 |
|
Dao007forever
|
c2a4005c70
|
[KV Connector] Propagate MooncakeStore load failures (#42788)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
|
2026-05-25 22:12:15 -07:00 |
|
Dao007forever
|
7966fc7233
|
[KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-25 22:11:57 -07:00 |
|
Woosuk Kwon
|
aa2b56ffb0
|
[DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-05-25 21:08:29 -07:00 |
|
Jee Jee Li
|
ec5de7fa7d
|
[LoRA] Add one shot triton kernel For MoE LoRA (#42290)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-05-25 19:47:04 -07:00 |
|
Chaojun Zhang
|
71d810bbf4
|
[XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-26 02:01:30 +00:00 |
|
Jee Jee Li
|
d4004455d2
|
[Kernel] Remove NormGateLinear (#43554)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-25 09:49:19 +00:00 |
|
Nicolò Lucchesi
|
716d5294e6
|
[Misc] Print accuracy value for PD tests even on success (#43583)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-05-25 02:10:01 -07:00 |
|
Zhewen Li
|
873758c13a
|
[KV Connector] Handle Mooncake finish after preemption (#43281)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-05-25 01:58:38 -07:00 |
|
Yihuki
|
5c1aec3dc0
|
Reduce memory usage for granite_speech. (#42933)
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-25 14:12:57 +08:00 |
|
Roy Wang
|
0c942c69d6
|
[Doc] Add section on escalating stalled contributions (#43568)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-05-25 14:11:01 +08:00 |
|
Yifan Qiao
|
81252d4e24
|
[Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-05-25 14:04:30 +08:00 |
|
Nguyễn Thế Duy
|
3df1c7c43e
|
[Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-25 13:45:31 +08:00 |
|
wang.yuqi
|
1b26fa361e
|
[Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-25 13:44:39 +08:00 |
|
weizhoublue
|
6cbe448eed
|
fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
|
2026-05-25 12:03:05 +08:00 |
|
Jee Jee Li
|
b06813e872
|
[Kernel] Add mhc_pre_big_fuse_with_norm_tilelang (#43474)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-25 01:19:45 +00:00 |
|
Rotem Shavitt
|
d0a100c87a
|
File system secondary tier implemented in python (#41735)
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-05-24 18:14:44 +00:00 |
|
danisereb
|
d56285c747
|
Tuning script and configs for Triton Mamba SSU kernel (#43083)
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
|
2026-05-24 20:12:44 +03:00 |
|
TJian
|
1806d1adfc
|
[ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-24 18:43:08 +08:00 |
|
Andreas Karatzas
|
5940590855
|
[ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-24 10:06:49 +00:00 |
|
Or Ozeri
|
357fddf614
|
[kv_offload]: Add DSv4 support (#43142)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-05-24 11:10:12 +03:00 |
|