akii96
|
4200f62147
|
[ROCm][GPT-OSS] Fuse RoPE + static Q FP8 quant on fused RoPE+KV path (#42832)
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-05 16:22:19 -05:00 |
|
Walter Beller-Morales
|
c73b0d0db9
|
[Core][Engine] allow DP ray placement groups to be set on specific nodes (#44669)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-06-05 20:07:47 +00:00 |
|
yzong-rh
|
703fb17b13
|
[Bugfix] GPT-OSS instruction rendering (#44330)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-06-05 13:52:32 -04:00 |
|
Effi Ofer
|
6a894574bf
|
Add objectstore as a secondary tier to multi-tier kv cache offloading (#41968)
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
|
2026-06-05 18:05:41 +03:00 |
|
Harry Mellor
|
ef0df7dbd6
|
[CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:56:27 +00:00 |
|
Harry Mellor
|
62215e72c6
|
Remove KV cache scale boilerplate from model weight loading methods (#43167)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-05 05:19:04 -07:00 |
|
Tianyu Zhang
|
7fe7800fa4
|
[BUG] Fix FP64 Gumbel precision coverage (#43150)
Signed-off-by: tianyu-z <zhangtianyupro@gmail.com>
Signed-off-by: Tianyu Zhang <53099276+tianyu-z@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-06-05 19:04:14 +08:00 |
|
Nicolò Lucchesi
|
d98b8f371c
|
[NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-05 11:08:17 +02:00 |
|
XuZhou
|
d61d8566ec
|
[Bugfix] Update mistral tokenizer test for continue_final_message fix (#44622)
Signed-off-by: Xu Zhou <xuzhou9417@163.com>
Co-authored-by: Xu Zhou <xuzhou9417@163.com>
|
2026-06-05 16:13:26 +08:00 |
|
XuZhou
|
6542d48964
|
[Bugfix] Fix test_invocations flaky failure with newer openai SDK (#44618)
Signed-off-by: Xu Zhou <xuzhou9417@163.com>
Co-authored-by: Xu Zhou <xuzhou9417@163.com>
|
2026-06-05 07:36:20 +00:00 |
|
Han Lin
|
165b7864d0
|
[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426)
Signed-off-by: hanlin12 <hanlin12@amd.com>
Signed-off-by: Han Lin <hanlin12@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-06-04 23:45:36 -07:00 |
|
qizixi
|
96229fa99e
|
[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP output plumbing (#43720)
Signed-off-by: zixi-qi <zixi@inferact.ai>
|
2026-06-04 22:04:19 -07:00 |
|
Tushar Jain
|
38fd2405f3
|
use split_group for pytorch process group creation (#41980)
Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com>
Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>
|
2026-06-04 14:36:07 -04:00 |
|
Agata Dobrzyniewicz
|
a947f7a420
|
[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to XPU (#43307)
Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>
|
2026-06-04 14:25:59 -04:00 |
|
bnellnm
|
439203d32c
|
[Bugfix] Fix test_cutlass_moe.py (#44380)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-06-04 14:18:52 -04:00 |
|
Taneem Ibrahim
|
8d9536a775
|
[Misc] Add unit tests for pooler head classes (#44471)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-06-04 17:59:25 +00:00 |
|
Divakar Verma
|
06f94633e7
|
[ROCm][CI] Add test for Aiter unified attn kernel (#44436)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 16:15:05 +00:00 |
|
Zvi Kons
|
b21443e23c
|
Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-06-04 14:47:48 +00:00 |
|
Michael Goin
|
06ee2d8433
|
[Quant] Support compressed-tensors WNA8O8Int linears and WNInt embeddings (#44340)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-06-04 07:40:33 -07:00 |
|
Yongye Zhu
|
b5235fca2e
|
[DSv4] Adding TRTLLM gen attention kernel (#43827)
|
2026-06-04 07:35:09 -07:00 |
|
Andreas Karatzas
|
3e77036768
|
[ROCm][CI] Specifying time outs for the lm eval models (#44255)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:35:00 +08:00 |
|
Andreas Karatzas
|
6f68ca3e91
|
[ROCm][CI] Stabilize memory-release in the Hybrid model generation tests (#44046)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:34:24 +08:00 |
|
Nicolò Lucchesi
|
68f5e565c9
|
[PD][Nixl] Mamba prefix caching mode support (#42554)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-04 06:41:46 -07:00 |
|
Dipika Sikka
|
e68988a248
|
Refactor CT NVFP4 linear to use a single class (#42443)
|
2026-06-04 08:25:08 -04:00 |
|
wangxiyuan
|
90619351e3
|
[Attention] Mamba attention module refactor - LINEAR (#43556)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-04 18:45:29 +08:00 |
|
Jiahan Chang (Cyrus)
|
d0975a4b50
|
[perf] Add gemma RMS AR fusion (#42646)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2026-06-04 01:33:59 -07:00 |
|
Wei Zhao
|
a6183563b6
|
[Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache (#43447)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-06-04 00:48:31 -07:00 |
|
wang.yuqi
|
d01d0b4646
|
[Frontend] Consolidate online serving utils. (#44479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-06-04 06:49:31 +00:00 |
|
Oxana Korzh
|
b4b4aaa70e
|
[Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* custom ops (#42129)
Signed-off-by: Oxana Korzh <okorzh@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-06-04 00:03:52 -05:00 |
|
Ilya Markov
|
4f423bd5bc
|
[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-06-04 03:40:34 +00:00 |
|
Wentao Ye
|
e6018c644a
|
[Refactor] Remove dead code in tests and parallel_state (#41471)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 19:32:39 -07:00 |
|
Oğuzhan KIR
|
f25952e59b
|
[MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
Signed-off-by: oguz <oguzhankir17@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-06-04 10:24:25 +08:00 |
|
maobaolong
|
b58e082d95
|
[KV Connector] Update lmcache kv_offloading_backend to use LMCacheMPConnector (#42865)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2026-06-03 19:23:55 -07:00 |
|
Ted Mostly
|
0c1e6f63f5
|
[Bugfix] Fix VLLMNotFoundError when using LoRA adapter name in poolin… (#44410)
Signed-off-by: Ted Mostly <wanghenshui@qq.com>
|
2026-06-04 02:22:03 +00:00 |
|
Yan Ma
|
0414d75410
|
[XPU] skip unapplied UT in test_gpu_model_runner.py (#44289)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-04 08:48:17 +08:00 |
|
hoobnn
|
2b237c7a41
|
[Bugfix] Honor tool_choice="none" in Chat Completions streaming (#42752)
Signed-off-by: hoobnn <111053672+hoobnn@users.noreply.github.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>
|
2026-06-03 13:27:45 -07:00 |
|
Wentao Ye
|
dad95e34d8
|
[Feature] Support batch invariant rms norm with residual (#42453)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-06-03 15:22:01 -04:00 |
|
Luciano Martins
|
a248b45d05
|
[Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2026-06-03 12:01:39 -07:00 |
|
Mengqing Cao
|
0c6631f02a
|
[KVCache] Support Pluggable KVCacheSpec (#37505)
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 09:05:16 -07:00 |
|
Nicolò Lucchesi
|
df7252c343
|
[CI] Align PD tests to HMA on by default (#44174)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-04 00:04:30 +08:00 |
|
Chauncey
|
27f1d34a23
|
[Frontend][Responses API] Move developer-to-system conversion into HF renderer (#43590)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: kdcyberdude <kdsingh.cyberdude@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
|
2026-06-03 14:52:24 +00:00 |
|
Varun Sundar Rabindranath
|
3d76f395e3
|
[SharedOffloadRegion] Align blocks to page-size (#43689)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2026-06-03 14:25:57 +03:00 |
|
Li, Jiang
|
823d271c0d
|
[Attention][CPU] Standardize kv layout to blocks first (#44393)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-06-03 19:03:09 +08:00 |
|
Andy Lo
|
95b1615ec9
|
[Perf] Improve multimodal item handling from O(n) to O(log n) per step (#44212)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-06-03 11:00:26 +00:00 |
|
Itay Etelis
|
1fa9ea09f6
|
[Perf] Triton fast path for small CPU→GPU swap_blocks_batch in the offloading connector (#42212)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 13:38:17 +03:00 |
|
Flora Feng
|
209709a8c1
|
[Bugfix] Fix unstreamed tool call args dropped in Responses API streaming (#44348)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-06-03 03:19:08 -07:00 |
|
Charlie Fu
|
71df063c49
|
Enable perf_token_group_quant/_C_stable_libtorch for ROCm (#42758)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-06-02 23:23:28 -07:00 |
|
Albert Cheng
|
e0081ef8cf
|
[Benchmark] Enable reasoning-model (thinking) benchmarking via --chat-template-kwargs for client-rendered datasets (#44244)
Signed-off-by: Albert Cheng <albertching0112@gmail.com>
|
2026-06-02 22:49:51 -07:00 |
|
William Rom
|
f0204358d9
|
[Bugfix] fix crash in postprocess for null tool args (#43862)
Signed-off-by: William-Rom <william.rom@intility.no>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-02 22:17:26 -07:00 |
|
Rotem Shavitt
|
3f0a91bb96
|
Nit Changes in Tiered KV Offload (#44293)
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
|
2026-06-02 21:53:21 -07:00 |
|