1373 Commits

Author SHA1 Message Date
Li, Jiang c505cd93ef [CI/Build] Disable CPU-Compatibility Tests (#44605)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-06-05 13:14:26 +08:00
zofia 063ce98fb7 [XPU][MoE] support block_fp8_moe on xpu (#42139)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>
2026-06-05 08:36:58 +08:00
wang.yuqi d01d0b4646 [Frontend] Consolidate online serving utils. (#44479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-04 06:49:31 +00:00
JartX 5b2a2beade [ROCm][CI] Move Model Executor test step from MI250 to MI300 (gfx942) (#44370)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-06-03 12:23:51 -05:00
Andreas Karatzas 87954eb50e [ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci-bake script (#36949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-02 23:43:07 -07:00
Flora Feng e67063826b [CI] Add missing vllm/parser/ CI trigger and fix test_parse.py (#44352)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-06-02 21:05:19 -07:00
Daoyuan Li bd98e97557 [Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc that references it (#44128)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
2026-06-03 00:22:10 +00:00
Nick Hill e15f20258b [ModelRunnerV2] Avoid pipeline parallel bubbles (#42187)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-02 14:02:01 -07:00
wang.yuqi b623f7ea95 [Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-02 06:30:21 -07:00
Fadi Arafeh 0b25cf4419 [CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-06-02 08:00:48 +00:00
Alec 816cc73a9b [Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
2026-06-01 19:34:05 -07:00
wang.yuqi 0910f7e0e1 [Frontend] Resettle generative scoring entrypoint. (#44153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-01 07:54:59 +00:00
Kevin H. Luu 8fad266507 [CI] Fix smoke test step key to bypass block gate (#43974)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-29 16:28:32 -07:00
Flora Feng 6de08e8b46 [CI] Remove redundant test_chat_with_tool_reasoning.py (#44011)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-29 19:23:56 +00:00
Kevin H. Luu 6aabe221a5 [CI] Make Model Executor test hangs fail fast with a traceback (#43971)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-29 11:58:25 -07:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
Tianmu Li 94d3f4d205 [CPU Backend] CPU top-k and top-p sampling kernels using Triton (#43633)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-29 15:02:39 +08:00
Kevin H. Luu 648c3ebee6 [CI] Separate non-root smoke tests from image build step (#43712)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-28 23:34:16 -07:00
yzong-rh 325a1ec4fb [CI] Enable prefix caching in BFCL benchmark (#43925)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-05-28 23:36:31 +00:00
Michael Goin 03f03f9630 Refactor output filename handling in ci-fetch-log.sh (#43901)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-05-28 14:20:12 -07:00
Micah Williamson 1b5437cec8 [ROCm] Bump ROCm to 7.2.3 (#43136)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-05-28 09:42:43 -07:00
Li, Jiang 20d69d100a [CPU] Migrate cpu_awq into awq_marlin (#43841)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-28 22:36:31 +08:00
Andreas Karatzas a9bc0ad8e4 [ROCm][CI] Move workload from MI300 to MI325 (#43824)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-28 03:31:29 -07:00
Andreas Karatzas 33e94fc3ad [ROCm][CI] Stabilize Cargo cache and pre-test image checks (#43815)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-28 11:24:44 +08:00
Harry Mellor 2616f67faa Remove Transformers forward/backward compatibility tests (#43785)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-27 12:46:36 -07:00
Luciano Martins dede691c95 [Bugfix] Split attention groups by num_heads_q for spec-decode drafts (#43543)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-05-27 00:11:01 +00:00
Kevin H. Luu e19b9b1045 [ci] Add arm64 ci image (#41303)
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-26 14:38:09 -07:00
Kevin H. Luu 49b4882779 [CI] Soft-fail AMD entrypoints mirror tests (#43709)
Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-26 13:08:48 -07:00
Yongye Zhu 6ab6ffb428 [Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162) 2026-05-26 05:12:54 -10:00
Andreas Karatzas 445ded18c1 [ROCm][CI] Extend ROCm quick reduce coverage (#40990)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-26 21:57:13 +08:00
Nguyễn Thế Duy 3df1c7c43e [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-25 13:45:31 +08:00
Andreas Karatzas 2a7d5b7324 [ROCm][CI] Remove benchmarks test group and shard long test groups (#41669)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-23 23:31:46 +08:00
Jakub Zakrzewski 5bb8d2767a [Kernel] Batch invariant NVFP4 linear using cutlass (#39912)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
2026-05-23 09:41:12 -04:00
sychen52 fb21d8b4f9 Add NVFP4 MOE support for Deepseek V4. (#42209)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2026-05-22 07:21:51 -07:00
Li, Jiang 65b7a812a2 [CPU] Experimentally enable Triton and MRV2 (#43225)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-22 01:48:17 -07:00
Bugen Zhao 39910f2b25 [Rust Frontend] Move code from vllm-frontend-rs (#43283)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Signed-off-by: Will.hou <1205157517@qq.com>
Signed-off-by: Will.hou <willamhou@ceresman.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Eric Curtin <eric.curtin@docker.com>
Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Co-authored-by: Will.hou <1205157517@qq.com>
Co-authored-by: Will.hou <willamhou@ceresman.com>

Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history.
2026-05-21 17:21:48 -07:00
xiangdong 5ecd8e9c70 [XPU][CI]Fix Docker image pull-to-run race in Intel GPU CI (#43266)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-21 10:41:38 +00:00
Nick Hill f2ace1d57d [Frontend][RFC] Rust front-end integration (#40848)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
2026-05-21 12:24:48 +08:00
Louie Tsai 5d041cc1fe update GPU json file based on h200 recipes (#43262)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
2026-05-21 03:57:48 +00:00
xiangdong 6f21558da1 [XPU][CI] Add 2 server model test files in Intel GPU CI (#42499)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-05-20 16:54:58 +08:00
Kevin H. Luu 85959567c3 [ci] Revert model executor test back to L4 (#43188)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-05-19 23:01:41 -07:00
Kevin H. Luu a65093c1a3 [ci] Move language models tests (hybrid) back to L4 (#43129)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-05-19 11:51:34 -07:00
zhanqiuhu 129019f334 [CI] Add MTP + PD disagg test for Qwen3.5 (#42677)
Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-05-19 11:44:33 +02:00
wang.yuqi 301d986473 [Frontend] Consolidate beam search by BeamSearchMixin. (#42946)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-05-19 07:37:40 +00:00
Kevin H. Luu 6e889b582b [ci] Route 28 gpu_1_queue tests to h200_35gb queue (#43030)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-18 21:58:36 -07:00
Kunshang Ji 36dcaf25d8 [XPU] add gptq(int4) support (#37844)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-19 11:17:09 +08:00
xiangdong 2e40faf08b [XPU][CI] Temporarily skip test_moe_lora_align_block_size_mixed_base_and_lora[1] in Intel GPU CI (#42954)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-05-18 20:34:48 +08:00
Yuwen Zhou 88a860d754 [CPU] Add MXFP4 W4A16 MoE support (#41922)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Yuwen Zhou <yuwen.zhou@intel.com>
2026-05-18 03:04:45 -07:00
wenjun liu c38bed4248 delete xpu ci (#42582)
Signed-off-by: wenjun.liu <wenjun.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-18 16:36:45 +08:00
Jiangyun Zhu 8a56da3845 [Experimental] Breakable CUDA graph (#42304)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-05-16 22:04:12 +08:00