Andy Lo
|
95b1615ec9
|
[Perf] Improve multimodal item handling from O(n) to O(log n) per step (#44212)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-06-03 11:00:26 +00:00 |
|
Itay Etelis
|
1fa9ea09f6
|
[Perf] Triton fast path for small CPU→GPU swap_blocks_batch in the offloading connector (#42212)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 13:38:17 +03:00 |
|
Yan Ma
|
02564b4de0
|
[XPU]fallback to TRITON_ATTN for vit attn on xpu when use float32 dtype (#43759)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-06-03 03:20:21 -07:00 |
|
Flora Feng
|
209709a8c1
|
[Bugfix] Fix unstreamed tool call args dropped in Responses API streaming (#44348)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-06-03 03:19:08 -07:00 |
|
Wei Zhao
|
ace95c9cf8
|
[Bugfix] Update TrtLLM MoE routing methods (#44347)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 02:56:43 -07:00 |
|
Shanshan Shen
|
0e2b13103b
|
[Doc] Update ViT CUDA graph interfaces (#44388)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-06-03 01:20:59 -07:00 |
|
Bugen Zhao
|
449be4f934
|
[Rust Frontend] Fix several hf chat template rendering issues (#44311)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-03 01:04:43 -07:00 |
|
Xunzhuo
|
6550ff12f2
|
[Rust Frontend] Add dynamic LoRA endpoints (#43778)
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-03 07:55:29 +00:00 |
|
NolanHo
|
4aaed4ca22
|
[Rust Frontend] Add server router extension hook (#43774)
Signed-off-by: NolanHo <kujyo.eia.serias@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-03 07:45:31 +00:00 |
|
Varun Sundar Rabindranath
|
7268457999
|
[KV Offloading] Enable HMA models for Tiering Offloading (#44287)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2026-06-03 10:03:00 +03:00 |
|
Majid
|
9af53a3c13
|
[Perf] Add tuned selective_state_update configs for H200 and RTX PRO … (#44251)
Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
|
2026-06-02 23:59:01 -07:00 |
|
Andreas Karatzas
|
87954eb50e
|
[ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci-bake script (#36949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-02 23:43:07 -07:00 |
|
Charlie Fu
|
71df063c49
|
Enable perf_token_group_quant/_C_stable_libtorch for ROCm (#42758)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-06-02 23:23:28 -07:00 |
|
Albert Cheng
|
e0081ef8cf
|
[Benchmark] Enable reasoning-model (thinking) benchmarking via --chat-template-kwargs for client-rendered datasets (#44244)
Signed-off-by: Albert Cheng <albertching0112@gmail.com>
|
2026-06-02 22:49:51 -07:00 |
|
William Rom
|
f0204358d9
|
[Bugfix] fix crash in postprocess for null tool args (#43862)
Signed-off-by: William-Rom <william.rom@intility.no>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-02 22:17:26 -07:00 |
|
Willow Lopez
|
597bc15936
|
fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init (#44236)
Signed-off-by: Willow Lopez <100782273+Oxygen56@users.noreply.github.com>
|
2026-06-03 01:07:10 -04:00 |
|
Rotem Shavitt
|
3f0a91bb96
|
Nit Changes in Tiered KV Offload (#44293)
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
|
2026-06-02 21:53:21 -07:00 |
|
Flora Feng
|
e67063826b
|
[CI] Add missing vllm/parser/ CI trigger and fix test_parse.py (#44352)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-06-02 21:05:19 -07:00 |
|
Andreas Karatzas
|
53b88d1dfc
|
[CI] Reject out-of-vocabulary before they reach the GPU logprob path (#44042)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-02 22:27:52 -05:00 |
|
JartX
|
7b476c8f14
|
[ROCm][CI] Skip fp8 reload tests on gfx90a (MI250) (#44369)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-06-02 22:27:14 -05:00 |
|
JartX
|
4454a18695
|
[ROCm][CI] Fix stale wvSplitK GEMM fallback test for N=5 (#44368)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-06-02 22:00:25 -05:00 |
|
wangxiyuan
|
02a01496fc
|
[Platform] Add is_cumem_allocator_available (#43838)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-03 10:54:50 +08:00 |
|
Kevin H. Luu
|
27a93cd426
|
[docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-06-02 18:58:22 -07:00 |
|
Wei Zhao
|
969aec4bc8
|
[Bugfix] Fix Deepseek v4 non-mega-moe model init error (#44356)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-06-02 18:26:30 -07:00 |
|
Jongseok Park
|
ca17b6b17d
|
[Perf] Apply single-pass min_larger finding and binary search in Triton Top-p path. (#42191)
Signed-off-by: js_park <cakeng@naver.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-06-02 17:57:26 -07:00 |
|
Woosuk Kwon
|
b254e0456c
|
[DSV4] Minor cleanup for DeepseekV4MegaMoEExperts (#44367)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-06-02 17:54:27 -07:00 |
|
Daoyuan Li
|
bd98e97557
|
[Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc that references it (#44128)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
|
2026-06-03 00:22:10 +00:00 |
|
Junhao Shen
|
a4ac746405
|
[MoE/b12x] Accept W4A16 (kNvfp4Static, None) in FlashInferB12xExperts supports check (#43332)
Signed-off-by: Junhao Shen <junshen@nvidia.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
|
2026-06-02 15:20:37 -07:00 |
|
Vadim Gimpelson
|
8b3b71ee9d
|
[CI/Build] Bump flashinfer to v0.6.12 (#44036)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-06-02 15:19:05 -07:00 |
|
Siddharth Bedekar
|
0917a009d3
|
Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
|
2026-06-02 21:51:21 +00:00 |
|
SeongJun Lee
|
3099de3617
|
[Kernel][MoE] Add GELU_TANH to CPU, CUTLASS, and WNA16 MoE backends (#42027)
Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>
Co-authored-by: lesj0610 <lesj0610@users.noreply.github.com>
|
2026-06-02 17:12:08 -04:00 |
|
Nick Hill
|
e15f20258b
|
[ModelRunnerV2] Avoid pipeline parallel bubbles (#42187)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-06-02 14:02:01 -07:00 |
|
Matthew Bonanni
|
557781131a
|
[Misc] Remove stray empty file (#44350)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-06-02 12:53:03 -07:00 |
|
Yifan Qiao
|
e9e08c49b9
|
[Bugfix] Cache the EAGLE/MTP lookahead block in the SWA prefix-cache mask (#44082)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
|
2026-06-02 12:21:07 -07:00 |
|
Woosuk Kwon
|
e4a2e584e5
|
[MRV2] Remove assignment of graph_pool in cudagraph_utils (#44338)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-06-02 11:50:27 -07:00 |
|
dependabot[bot]
|
b8b49e2395
|
Bump actions/github-script from 8.0.0 to 9.0.0 (#39667)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2026-06-02 11:26:57 -07:00 |
|
Nick Hill
|
da107a59e5
|
[MRV2] Also enable MRV2 for Llama and Mistral dense models (#43458)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
|
2026-06-02 11:18:46 -07:00 |
|
Chauncey
|
ed9a7526b6
|
[Anthropic] Support system role messages inside messages array (#44283)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
|
2026-06-02 18:13:54 +00:00 |
|
Wei Zhao
|
2427094152
|
[Feature] Support EPLB for DeepSeek v4 Mega Moe (#43339)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao (Engrg-Hardware 1) <weizha@login-lyris01.lyris.clusters.nvidia.com>
|
2026-06-02 10:56:44 -07:00 |
|
Kartavya sonar
|
fe32e7830b
|
[Bugfix] flashinfer: fail fast when --kv-cache-dtype nvfp4 used on unsupported arch (#43669)
Signed-off-by: Kartavya Sonar <sonarkartavya@gmail.com>
|
2026-06-02 10:50:00 -07:00 |
|
Alireza Dadgarnia
|
afcb580715
|
[BugFix] Fix Humming MoE deploy error (#43100)
Signed-off-by: Alireza Dadgarnia <dadgarnia@Alirezas-MacBook-Pro-2.local>
Signed-off-by: Alireza Dadgarnia <49554709+adotdad@users.noreply.github.com>
Co-authored-by: Alireza Dadgarnia <dadgarnia@Alirezas-MacBook-Pro-2.local>
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2026-06-02 09:32:50 -07:00 |
|
liuzhenwei
|
3f3e2702c2
|
[XPU] Enable rms_norm/act quant fusions (#43963)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-06-02 16:14:41 +00:00 |
|
Flora Feng
|
478b49ddec
|
[Refactor] Remove dead code from parser infrastructure (#44279)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-06-02 12:08:27 -04:00 |
|
Nick Hill
|
cab5c9a2a9
|
[Core] Move max_concurrent_batches to VllmConfig (#44274)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-06-02 08:57:25 -07:00 |
|
Brian Dellabetta
|
774e552397
|
[compressed-tensors] Asymmetric support for MoE WNA16 marlin (#44025)
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
|
2026-06-02 08:51:45 -07:00 |
|
XiaoZ
|
53fa09d085
|
[Misc] Support local image encoding in benchmarks (#43843)
Signed-off-by: xiaoz <Sukra1@outlook.com>
|
2026-06-02 15:15:06 +00:00 |
|
Chris Leonard
|
4d93bc35c9
|
Migrate header files to torch stable abi (#44013)
|
2026-06-02 08:09:52 -07:00 |
|
Bugen Zhao
|
586201ebdc
|
[Rust Frontend] Cover different thinking modes in roundtrip tests (#44320)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-02 07:51:25 -07:00 |
|
pschlan-amd
|
88f172188b
|
[ROCm] Fix AITER RMSNormQuantFusion for Kimi-Linear (#44308)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
|
2026-06-02 14:50:21 +00:00 |
|
Bugen Zhao
|
880fc032f4
|
[Rust Frontend] Support recursive tool parameter conversion (#44299)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-02 07:45:35 -07:00 |
|