obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Li, Jiang	c505cd93ef	[CI/Build] Disable CPU-Compatibility Tests (#44605 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-06-05 13:14:26 +08:00
zofia	063ce98fb7	[XPU][MoE] support block_fp8_moe on xpu (#42139 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>	2026-06-05 08:36:58 +08:00
wang.yuqi	d01d0b4646	[Frontend] Consolidate online serving utils. (#44479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-06-04 06:49:31 +00:00
JartX	5b2a2beade	[ROCm][CI] Move Model Executor test step from MI250 to MI300 (gfx942) (#44370 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-06-03 12:23:51 -05:00
Andreas Karatzas	87954eb50e	[ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci-bake script (#36949 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-06-02 23:43:07 -07:00
Flora Feng	e67063826b	[CI] Add missing vllm/parser/ CI trigger and fix test_parse.py (#44352 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-06-02 21:05:19 -07:00
Daoyuan Li	bd98e97557	[Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc that references it (#44128 ) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>	2026-06-03 00:22:10 +00:00
Nick Hill	e15f20258b	[ModelRunnerV2] Avoid pipeline parallel bubbles (#42187 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-06-02 14:02:01 -07:00
wang.yuqi	b623f7ea95	[Frontend] Consolidate dev entrypoints. (#44170 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-06-02 06:30:21 -07:00
Fadi Arafeh	0b25cf4419	[CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-06-02 08:00:48 +00:00
Alec	816cc73a9b	[Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266 ) Signed-off-by: Alec Flowers <aflowers@nvidia.com>	2026-06-01 19:34:05 -07:00
wang.yuqi	0910f7e0e1	[Frontend] Resettle generative scoring entrypoint. (#44153 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-06-01 07:54:59 +00:00
Kevin H. Luu	8fad266507	[CI] Fix smoke test step key to bypass block gate (#43974 ) Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-29 16:28:32 -07:00
Flora Feng	6de08e8b46	[CI] Remove redundant test_chat_with_tool_reasoning.py (#44011 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-05-29 19:23:56 +00:00
Kevin H. Luu	6aabe221a5	[CI] Make Model Executor test hangs fail fast with a traceback (#43971 ) Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-29 11:58:25 -07:00
Ilya Markov	4aaba00f92	[EPLB] Make async EPLB default (#43219 ) Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-05-29 18:07:16 +00:00
Tianmu Li	94d3f4d205	[CPU Backend] CPU top-k and top-p sampling kernels using Triton (#43633 ) Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-29 15:02:39 +08:00
Kevin H. Luu	648c3ebee6	[CI] Separate non-root smoke tests from image build step (#43712 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-28 23:34:16 -07:00
yzong-rh	325a1ec4fb	[CI] Enable prefix caching in BFCL benchmark (#43925 ) Signed-off-by: Yifan Zong <yzong@redhat.com>	2026-05-28 23:36:31 +00:00
Michael Goin	03f03f9630	Refactor output filename handling in ci-fetch-log.sh (#43901 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-05-28 14:20:12 -07:00
Micah Williamson	1b5437cec8	[ROCm] Bump ROCm to 7.2.3 (#43136 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-05-28 09:42:43 -07:00
Li, Jiang	20d69d100a	[CPU] Migrate cpu_awq into awq_marlin (#43841 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-05-28 22:36:31 +08:00
Andreas Karatzas	a9bc0ad8e4	[ROCm][CI] Move workload from MI300 to MI325 (#43824 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-28 03:31:29 -07:00
Andreas Karatzas	33e94fc3ad	[ROCm][CI] Stabilize Cargo cache and pre-test image checks (#43815 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-28 11:24:44 +08:00
Harry Mellor	2616f67faa	Remove Transformers forward/backward compatibility tests (#43785 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-05-27 12:46:36 -07:00
Luciano Martins	dede691c95	[Bugfix] Split attention groups by num_heads_q for spec-decode drafts (#43543 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2026-05-27 00:11:01 +00:00
Kevin H. Luu	e19b9b1045	[ci] Add arm64 ci image (#41303 ) Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-26 14:38:09 -07:00
Kevin H. Luu	49b4882779	[CI] Soft-fail AMD entrypoints mirror tests (#43709 ) Signed-off-by: Kevin Luu <kevin@inferact.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-26 13:08:48 -07:00
Yongye Zhu	6ab6ffb428	[Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162 )	2026-05-26 05:12:54 -10:00
Andreas Karatzas	445ded18c1	[ROCm][CI] Extend ROCm quick reduce coverage (#40990 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-26 21:57:13 +08:00
Nguyễn Thế Duy	3df1c7c43e	[Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275 ) Signed-off-by: TheDuyIT <nduy250299@gmail.com> Signed-off-by: dtnguyen <dtnguyen@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-25 13:45:31 +08:00
Andreas Karatzas	2a7d5b7324	[ROCm][CI] Remove benchmarks test group and shard long test groups (#41669 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-23 23:31:46 +08:00
Jakub Zakrzewski	5bb8d2767a	[Kernel] Batch invariant NVFP4 linear using cutlass (#39912 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>	2026-05-23 09:41:12 -04:00
sychen52	fb21d8b4f9	Add NVFP4 MOE support for Deepseek V4. (#42209 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2026-05-22 07:21:51 -07:00
Li, Jiang	65b7a812a2	[CPU] Experimentally enable Triton and MRV2 (#43225 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-05-22 01:48:17 -07:00
Bugen Zhao	39910f2b25	[Rust Frontend] Move code from `vllm-frontend-rs` (#43283 ) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Eric Curtin <eric.curtin@docker.com> Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Signed-off-by: Will.hou <1205157517@qq.com> Signed-off-by: Will.hou <willamhou@ceresman.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Eric Curtin <eric.curtin@docker.com> Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Co-authored-by: Will.hou <1205157517@qq.com> Co-authored-by: Will.hou <willamhou@ceresman.com> Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history.	2026-05-21 17:21:48 -07:00
xiangdong	5ecd8e9c70	[XPU][CI]Fix Docker image pull-to-run race in Intel GPU CI (#43266 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-05-21 10:41:38 +00:00
Nick Hill	f2ace1d57d	[Frontend][RFC] Rust front-end integration (#40848 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com>	2026-05-21 12:24:48 +08:00
Louie Tsai	5d041cc1fe	update GPU json file based on h200 recipes (#43262 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-05-21 03:57:48 +00:00
xiangdong	6f21558da1	[XPU][CI] Add 2 server model test files in Intel GPU CI (#42499 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-05-20 16:54:58 +08:00
Kevin H. Luu	85959567c3	[ci] Revert model executor test back to L4 (#43188 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-05-19 23:01:41 -07:00
Kevin H. Luu	a65093c1a3	[ci] Move language models tests (hybrid) back to L4 (#43129 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-05-19 11:51:34 -07:00
zhanqiuhu	129019f334	[CI] Add MTP + PD disagg test for Qwen3.5 (#42677 ) Signed-off-by: ZhanqiuHu <zhu@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-05-19 11:44:33 +02:00
wang.yuqi	301d986473	[Frontend] Consolidate beam search by BeamSearchMixin. (#42946 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-05-19 07:37:40 +00:00
Kevin H. Luu	6e889b582b	[ci] Route 28 gpu_1_queue tests to h200_35gb queue (#43030 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 21:58:36 -07:00
Kunshang Ji	36dcaf25d8	[XPU] add gptq(int4) support (#37844 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-05-19 11:17:09 +08:00
xiangdong	2e40faf08b	[XPU][CI] Temporarily skip test_moe_lora_align_block_size_mixed_base_and_lora[1] in Intel GPU CI (#42954 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-05-18 20:34:48 +08:00
Yuwen Zhou	88a860d754	[CPU] Add MXFP4 W4A16 MoE support (#41922 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Yuwen Zhou <yuwen.zhou@intel.com>	2026-05-18 03:04:45 -07:00
wenjun liu	c38bed4248	delete xpu ci (#42582 ) Signed-off-by: wenjun.liu <wenjun.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-18 16:36:45 +08:00
Jiangyun Zhu	8a56da3845	[Experimental] Breakable CUDA graph (#42304 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-05-16 22:04:12 +08:00

1 2 3 4 5 ...

1373 Commits