Han Lin
|
165b7864d0
|
[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426)
Signed-off-by: hanlin12 <hanlin12@amd.com>
Signed-off-by: Han Lin <hanlin12@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-06-04 23:45:36 -07:00 |
|
Li, Jiang
|
c505cd93ef
|
[CI/Build] Disable CPU-Compatibility Tests (#44605)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-06-05 13:14:26 +08:00 |
|
qizixi
|
96229fa99e
|
[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP output plumbing (#43720)
Signed-off-by: zixi-qi <zixi@inferact.ai>
|
2026-06-04 22:04:19 -07:00 |
|
Luciano Martins
|
da1daf40bf
|
[Bugfix] Exclude vision embedder from quantization in Gemma4 Unified (#44571)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2026-06-04 20:47:38 -07:00 |
|
Woosuk Kwon
|
4efd6ffde0
|
[DSV4] Refactor DeepseekV4Attention (#44569)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-06-04 20:23:07 -07:00 |
|
Chris Leonard
|
56aff0dd15
|
[10/n] Migrate cuda_view and silu_and_mul_per_block_quant kernels to torch stale ABI. (#44334)
|
2026-06-04 20:14:43 -07:00 |
|
zofia
|
063ce98fb7
|
[XPU][MoE] support block_fp8_moe on xpu (#42139)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>
|
2026-06-05 08:36:58 +08:00 |
|
Bugen Zhao
|
62d6f06e3d
|
[Rust Frontend] Skip loading multimodal processor if --language-model-only is specified (#44500)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-04 17:02:54 -07:00 |
|
Schwinn Saereesitthipitak
|
b7c5baf63d
|
fix: keep DeepSeek V4 RoPE cache on inv_freq device (#43926)
Signed-off-by: Schwinn Saereesitthipitak <schwinns@nvidia.com>
Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com>
|
2026-06-05 02:30:29 +04:00 |
|
Jiangyun Zhu
|
a55fccfc7c
|
[mamba] unify KDA conv states into one cache to match 2-state SSM layout (#44539)
|
2026-06-04 20:38:05 +02:00 |
|
Wentao Ye
|
41a4829f22
|
[Logs Refactor] Optimize shutdown logs, easier to follow and consistent (#43707)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-06-04 14:36:32 -04:00 |
|
Tushar Jain
|
38fd2405f3
|
use split_group for pytorch process group creation (#41980)
Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com>
Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>
|
2026-06-04 14:36:07 -04:00 |
|
Agata Dobrzyniewicz
|
a947f7a420
|
[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to XPU (#43307)
Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>
|
2026-06-04 14:25:59 -04:00 |
|
bnellnm
|
439203d32c
|
[Bugfix] Fix test_cutlass_moe.py (#44380)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-06-04 14:18:52 -04:00 |
|
Taneem Ibrahim
|
8d9536a775
|
[Misc] Add unit tests for pooler head classes (#44471)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-06-04 17:59:25 +00:00 |
|
Fadi Arafeh
|
3da29aa4a5
|
[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-06-04 16:27:17 +00:00 |
|
Divakar Verma
|
06f94633e7
|
[ROCm][CI] Add test for Aiter unified attn kernel (#44436)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 16:15:05 +00:00 |
|
JianweiZheng
|
99ef652907
|
[Bugfix] Reject non-positive values for ParallelConfig int knobs (#44057)
Signed-off-by: jwzheng96 <jianweizheng@pku.edu.cn>
Signed-off-by: JianweiZheng <32029023+jwzheng96@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-06-04 11:46:50 -04:00 |
|
Tyler Michael Smith
|
4cc78c9d5d
|
[Core] Freeze garbage collector in workers after model initialization (#44363)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-06-04 08:39:04 -07:00 |
|
tc-mb
|
3dbb4e0ace
|
[Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count mismatches visual embedding count (#44509)
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
|
2026-06-04 08:22:30 -07:00 |
|
Zvi Kons
|
b21443e23c
|
Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-06-04 14:47:48 +00:00 |
|
Michael Goin
|
06ee2d8433
|
[Quant] Support compressed-tensors WNA8O8Int linears and WNInt embeddings (#44340)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-06-04 07:40:33 -07:00 |
|
Yongye Zhu
|
b5235fca2e
|
[DSv4] Adding TRTLLM gen attention kernel (#43827)
|
2026-06-04 07:35:09 -07:00 |
|
Andreas Karatzas
|
3e77036768
|
[ROCm][CI] Specifying time outs for the lm eval models (#44255)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:35:00 +08:00 |
|
Andreas Karatzas
|
6f68ca3e91
|
[ROCm][CI] Stabilize memory-release in the Hybrid model generation tests (#44046)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:34:24 +08:00 |
|
Turner Jabbour
|
0c96dd64fb
|
[ROCm] Bump fastsafetensors to v0.3.2 from PyPI, remove git source build (#43625)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
|
2026-06-04 07:30:57 -07:00 |
|
Nicolò Lucchesi
|
68f5e565c9
|
[PD][Nixl] Mamba prefix caching mode support (#42554)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-04 06:41:46 -07:00 |
|
QiliangCui2023
|
9354fb1ba5
|
[Bugfix][Compile] Guard per_token_group_fp8_quant lookup on non-CUDA platforms (#44476)
|
2026-06-04 09:31:50 -04:00 |
|
Harry Mellor
|
f35b557239
|
Add GH token to docs build pre run check (#44534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-04 05:43:49 -07:00 |
|
Dipika Sikka
|
e68988a248
|
Refactor CT NVFP4 linear to use a single class (#42443)
|
2026-06-04 08:25:08 -04:00 |
|
Rui "Garry" Gao
|
4b87b3e845
|
[Bugfix] fix EVS for qwen3-vl (#44205)
Signed-off-by: Rui "Garry" Gao <garrygaogg@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-06-04 11:06:51 +00:00 |
|
wangxiyuan
|
90619351e3
|
[Attention] Mamba attention module refactor - LINEAR (#43556)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-04 18:45:29 +08:00 |
|
Jiahan Chang (Cyrus)
|
d0975a4b50
|
[perf] Add gemma RMS AR fusion (#42646)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2026-06-04 01:33:59 -07:00 |
|
Kevin_Xiong
|
1bdc60ed53
|
Fix Kimi-K2.5 FlashInfer ViT metadata (#44493)
Signed-off-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
|
2026-06-04 08:14:35 +00:00 |
|
Wei Zhao
|
a6183563b6
|
[Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache (#43447)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-06-04 00:48:31 -07:00 |
|
Andreas Karatzas
|
22c2e87555
|
[CI] Reverted gitignore changes (#44497)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 00:37:44 -07:00 |
|
wang.yuqi
|
d01d0b4646
|
[Frontend] Consolidate online serving utils. (#44479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-06-04 06:49:31 +00:00 |
|
Oxana Korzh
|
b4b4aaa70e
|
[Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* custom ops (#42129)
Signed-off-by: Oxana Korzh <okorzh@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-06-04 00:03:52 -05:00 |
|
Andreas Karatzas
|
5e2af28838
|
[CI] Resolve release V2 docker build after ROCm CI wheels change (#44463)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-03 21:35:40 -07:00 |
|
Ilya Markov
|
4f423bd5bc
|
[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-06-04 03:40:34 +00:00 |
|
Jie Fang
|
f0cd590d62
|
optimize the compressor 128 split cutedsl kernel (#44230)
Signed-off-by: Jie Fang <jief@nvidia.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2026-06-03 20:22:57 -07:00 |
|
Wentao Ye
|
e6018c644a
|
[Refactor] Remove dead code in tests and parallel_state (#41471)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 19:32:39 -07:00 |
|
Oğuzhan KIR
|
f25952e59b
|
[MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
Signed-off-by: oguz <oguzhankir17@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-06-04 10:24:25 +08:00 |
|
maobaolong
|
b58e082d95
|
[KV Connector] Update lmcache kv_offloading_backend to use LMCacheMPConnector (#42865)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2026-06-03 19:23:55 -07:00 |
|
Ted Mostly
|
0c1e6f63f5
|
[Bugfix] Fix VLLMNotFoundError when using LoRA adapter name in poolin… (#44410)
Signed-off-by: Ted Mostly <wanghenshui@qq.com>
|
2026-06-04 02:22:03 +00:00 |
|
Giancarlo Delfin
|
ceb0111a90
|
[Model Runner V2][Spec Decode] Add Gemma4 MTP support (#43241)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-06-04 00:51:06 +00:00 |
|
Yan Ma
|
0414d75410
|
[XPU] skip unapplied UT in test_gpu_model_runner.py (#44289)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-04 08:48:17 +08:00 |
|
Dima
|
128adabfe0
|
[Bugfix] Fix Gemma4 MTP block_table batch_size mismatch under concurrent load (#43982)
Signed-off-by: Dmytro Kuntso <dkuntso@amazon.co.uk>
Co-authored-by: Dmytro Kuntso <dkuntso@amazon.co.uk>
|
2026-06-03 17:11:10 -07:00 |
|
dependabot[bot]
|
bdbf08fc02
|
Bump actions/stale from 10.1.1 to 10.2.0 (#35078)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2026-06-03 14:14:41 -07:00 |
|
Woosuk Kwon
|
6bad553f4e
|
[Minor] Remove FlashInfer version check in topk_topp_sampler (#44442)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-06-03 21:06:00 +00:00 |
|