khluu
|
0b3ba88f16
|
Revert "[CPU] Experimentally enable Triton and MRV2 (#43225)"
This reverts commit 65b7a812a2.
v0.22.0
|
2026-05-29 02:28:43 -07:00 |
|
Vadim Gimpelson
|
799c3afa5d
|
[BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
v0.22.0rc3
|
2026-05-28 00:11:54 -07:00 |
|
Thien Tran
|
64e25235c7
|
[Bugfix] Pass routed_scaling_factor to FlashInfer TRTLLM BF16 MoE (#43769)
|
2026-05-28 00:11:49 -07:00 |
|
TJian
|
a147dd0115
|
[ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-28 00:11:43 -07:00 |
|
amitz-nv
|
0759293512
|
[Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2026-05-28 00:11:38 -07:00 |
|
Benjamin Bartels
|
a930f5a58d
|
Fix RunAI streamer tensor buffer reuse during weight loading (#43464)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2026-05-28 00:11:32 -07:00 |
|
Harry Mellor
|
40cf0206ba
|
Fix early CUDA init (#43791)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit 41688e2dc7)
v0.22.0rc2
|
2026-05-27 14:20:37 -07:00 |
|
Yongye Zhu
|
8c4061336a
|
[misc] Bump cutedsl version to 4.5.2 (#43745)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit 03d9cc2fe2)
|
2026-05-27 14:20:37 -07:00 |
|
Ashwin Giridharan
|
5ebdf473c5
|
[Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
(cherry picked from commit 52a31ccecc)
|
2026-05-27 14:20:37 -07:00 |
|
Nick Hill
|
a94cd6d98f
|
[MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
(cherry picked from commit 8c94938cfb)
v0.22.0rc1
|
2026-05-27 00:37:22 -07:00 |
|
Woosuk Kwon
|
edfb45bbd0
|
[DSv4] Refactor compressor & Fix ROCm compatibility (#43710)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit adaa5e455a)
|
2026-05-27 00:37:15 -07:00 |
|
Vadim Gimpelson
|
4eeee85f9b
|
[Bugfix][V1] Fix TOCTOU race causing intermittent EADDRINUSE on multi-API-server DP startup (#42585)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 812e7e7364)
|
2026-05-27 00:37:08 -07:00 |
|
Woosuk Kwon
|
c0a485e032
|
[DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit 193ce8812e)
|
2026-05-27 00:37:02 -07:00 |
|
Woosuk Kwon
|
db1b8f7097
|
[ROCm] Remove MegaMoE integration in deepseek v4 (#43629)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
(cherry picked from commit c8414a8271)
|
2026-05-27 00:36:56 -07:00 |
|
Yongye Zhu
|
fb83f09e8d
|
[Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)
(cherry picked from commit 6ab6ffb428)
|
2026-05-27 00:36:50 -07:00 |
|
Chaojun Zhang
|
260b528b1e
|
[XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
(cherry picked from commit 861b97765d)
|
2026-05-27 00:36:44 -07:00 |
|
Jie Fang
|
78ae17cba1
|
Add CuTe DSL sparse compressor support (#43584)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
(cherry picked from commit a37e47100c)
|
2026-05-27 00:36:37 -07:00 |
|
Thien Tran
|
b2007c4329
|
[GDN] GDN Prefill kernel for SM100 (#43273)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
(cherry picked from commit d56612c621)
|
2026-05-27 00:36:31 -07:00 |
|
Mohammad Miadh Angkad
|
b0e9ae808e
|
Fix CuPy runtime deps and restore humming (#43530)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
(cherry picked from commit a970fb5a1a)
|
2026-05-26 13:12:41 -07:00 |
|
Huanyu Yang
|
6f955986e1
|
[Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-05-25 22:43:19 -07:00 |
|
wang.yuqi
|
d5cf7b4a2c
|
[Frontend] Split the offline inference APIs and utils. (#43553)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-26 05:20:24 +00:00 |
|
Yan Ma
|
f815c99954
|
[Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-05-26 13:12:50 +08:00 |
|
Dao007forever
|
c2a4005c70
|
[KV Connector] Propagate MooncakeStore load failures (#42788)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
|
2026-05-25 22:12:15 -07:00 |
|
Dao007forever
|
7966fc7233
|
[KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-25 22:11:57 -07:00 |
|
Woosuk Kwon
|
aa2b56ffb0
|
[DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-05-25 21:08:29 -07:00 |
|
Jee Jee Li
|
ec5de7fa7d
|
[LoRA] Add one shot triton kernel For MoE LoRA (#42290)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-05-25 19:47:04 -07:00 |
|
Chaojun Zhang
|
71d810bbf4
|
[XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-26 02:01:30 +00:00 |
|
Jee Jee Li
|
d4004455d2
|
[Kernel] Remove NormGateLinear (#43554)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-25 09:49:19 +00:00 |
|
Nicolò Lucchesi
|
716d5294e6
|
[Misc] Print accuracy value for PD tests even on success (#43583)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-05-25 02:10:01 -07:00 |
|
Zhewen Li
|
873758c13a
|
[KV Connector] Handle Mooncake finish after preemption (#43281)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-05-25 01:58:38 -07:00 |
|
Yihuki
|
5c1aec3dc0
|
Reduce memory usage for granite_speech. (#42933)
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-25 14:12:57 +08:00 |
|
Roy Wang
|
0c942c69d6
|
[Doc] Add section on escalating stalled contributions (#43568)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-05-25 14:11:01 +08:00 |
|
Yifan Qiao
|
81252d4e24
|
[Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-05-25 14:04:30 +08:00 |
|
Nguyễn Thế Duy
|
3df1c7c43e
|
[Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-25 13:45:31 +08:00 |
|
wang.yuqi
|
1b26fa361e
|
[Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-25 13:44:39 +08:00 |
|
weizhoublue
|
6cbe448eed
|
fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
|
2026-05-25 12:03:05 +08:00 |
|
Jee Jee Li
|
b06813e872
|
[Kernel] Add mhc_pre_big_fuse_with_norm_tilelang (#43474)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-25 01:19:45 +00:00 |
|
Rotem Shavitt
|
d0a100c87a
|
File system secondary tier implemented in python (#41735)
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-05-24 18:14:44 +00:00 |
|
danisereb
|
d56285c747
|
Tuning script and configs for Triton Mamba SSU kernel (#43083)
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
|
2026-05-24 20:12:44 +03:00 |
|
TJian
|
1806d1adfc
|
[ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-24 18:43:08 +08:00 |
|
Andreas Karatzas
|
5940590855
|
[ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-24 10:06:49 +00:00 |
|
Or Ozeri
|
357fddf614
|
[kv_offload]: Add DSv4 support (#43142)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-05-24 11:10:12 +03:00 |
|
Dao007forever
|
0902d8e62f
|
[KV Connector] Keep MooncakeStore full hits block-aligned (#43494)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-23 23:15:03 -07:00 |
|
Wentao Ye
|
33d7cbe02c
|
[Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-23 16:37:24 -07:00 |
|
Flora Feng
|
b32fe416ea
|
[Bugfix] Fix reasoning dropped on streaming boundary deltas (#42691)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-05-23 16:18:30 -07:00 |
|
Michael Goin
|
10d264a2b9
|
Revert "[Misc] add humming to dependencies" (#43492)
|
2026-05-23 14:21:13 -07:00 |
|
TJian
|
46f95b2ec2
|
[ROCm][Critical] Fix the GDN import bug (#43486)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-05-23 21:12:58 +00:00 |
|
Dao007forever
|
819c610f9b
|
[Mooncake] Add metrics for MooncakeStoreConnector operations (#43392)
|
2026-05-23 13:34:40 -07:00 |
|
Siddharth Bedekar
|
4438b6e7dc
|
[MoE] Migrate W4A8 CT to oracle kernel setup (#42680)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-05-23 13:56:01 -04:00 |
|
Holegots
|
8737e4a857
|
[Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
|
2026-05-23 10:42:20 -07:00 |
|