obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Andreas Karatzas	4db300e95f	[ROCm][CI] Removed problematic command override mechanism (#42807 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-16 17:35:05 +08:00
Zhewen Li	657b42b592	[Docker][KVConnector] Build mooncake-transfer-engine from source (#42114 ) Signed-off-by: Zhewen Li <zhewenli@inferact.ai> Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Zhewen Li <zhewenli@inferact.ai> Co-authored-by: khluu <khluu000@gmail.com>	2026-05-16 00:26:25 -07:00
Michael Goin	de2d76f352	[Build] Switch CUDA 12.9 wheel builds to PyTorch manylinux_2_28 base (#41668 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-15 13:46:16 -07:00
Andreas Karatzas	d735968f6d	[ROCm][CI] Stage B gating (#42025 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-15 01:49:27 -07:00
Cyrus Leung	2676ab1e0b	[Deprecation] Remove old locations of `get_tokenizer` and `resolve_hf_chat_template` (#35024 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-05-15 00:13:32 -07:00
Louie Tsai	e30f39c4f1	Update Intel Xeon model list and vLLM Benchmark Suite BKMs (#42607 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-05-15 05:14:03 +00:00
Chengyi Nie	fa2a33b893	[Quant] Consolidate GPTQ: rename gptq_marlin.py to auto_gptq.py (#38288 ) Signed-off-by: Chengyi Nie <cnie@roblox.com> Co-authored-by: Chengyi Nie <cnie@roblox.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-15 08:25:52 +08:00
zhanqiuhu	24337fb860	PD disagg with NIXL Connector: GDN support (Qwen3.5) (#41869 ) Signed-off-by: Zhanqiu Hu <zhu@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-05-14 16:33:01 +02:00
Libin Tang	9946c38b7f	[XPU] Fix double-transpose in XPUFP8ScaledMMLinearKernel for W8A8 quant method (#41689 ) Signed-off-by: Libin Tang <libin.tang@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-14 17:17:39 +08:00
liuzhenwei	b26558d4a3	[CI][XPU] skip ut of offload connector (#42598 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-05-14 13:13:53 +08:00
Michael Goin	2f821faeae	[Spec Decode] Support hybrid attention models in extract_hidden_states (#39949 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 10:45:53 -07:00
Wentao Ye	e35c0d4c63	[Feature] Support compile mode for batch invariance on SM80 (#42456 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-05-13 11:02:39 -04:00
Kevin H. Luu	f6e868fbdf	[CI] Use uv with Python 3.12 for PyPI wheel upload (#42470 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 02:12:06 -07:00
Yifan Qiao	9ce74042d3	[Bugfix][SimpleCPUOffloadBackend] Dedup in-flight CPU offload stores across scheduler steps (#41289 ) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-13 01:53:32 -07:00
Alec	07534b8782	[PD] Bump NIXL connector dependency to 1.x (#42364 ) Signed-off-by: Alec Flowers <aflowers@nvidia.com>	2026-05-12 18:05:01 -07:00
Kevin H. Luu	8c4fc4202a	[CI] Inline build artifact annotations in release pipeline (#42357 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-12 15:57:43 -07:00
Giancarlo Delfin	fe5b4e0fe7	[Model Runner V2] Apply synthetic mode to probabilistic rejection sampler (#41035 )	2026-05-12 13:37:03 -07:00
Kevin H. Luu	379f0ec369	[CI] Migrate 6 verified jobs from gpu_1_queue to h200_18gb MIG (#42446 ) Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 11:52:01 -07:00
shanjiaz	6ccb10d794	Added peagle speculators support (#41826 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>	2026-05-12 07:55:57 -07:00
Michael Goin	d077622d60	[Build] Build bundled DeepGEMM `_C` per-Python so the wheel imports on every CPython (#41516 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:27:29 -04:00
Kevin H. Luu	e1c8776e90	[CI] Move DockerHub and PyPI publish steps to end of release pipeline (#42355 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-12 09:17:42 +00:00
Kevin H. Luu	1ff9d33535	[CI] Migrate remaining B200 jobs to b200-k8s with test fixes (#42387 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-12 02:00:37 -07:00
Kevin H. Luu	f69644caf8	[CI] Migrate more B200 jobs to b200-k8s queue (#42356 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-12 00:38:31 -07:00
wang.yuqi	a0dc7a0f36	[CI] Consolidate Speech to Text tests (#42274 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-05-11 19:50:17 +00:00
Flora Feng	639cbfd274	[CI] Add tests/parser to CI coverage (#41877 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-05-11 19:08:54 +00:00
haosdent	17ed5e61f5	[CI] Make Python-only Installation optional (#42293 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-05-11 09:47:16 +00:00
Jee Jee Li	05d610e5cd	[CI/Build] Reduce LoRA model tests. (#42266 ) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>	2026-05-11 14:49:08 +08:00
Andreas Karatzas	0a309b5ee9	[ROCm] Cap Triton paged attention block size to fix ROCm shared memory OOM (#38502 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-10 10:03:00 +00:00
Jee Jee Li	84f7a55340	[CI] Trigger LoRA test when changing MoE code. (#42196 ) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>	2026-05-10 01:26:09 -07:00
Andreas Karatzas	f2840120f6	[ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (#41313 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-10 14:50:16 +08:00
Andreas Karatzas	fb1ac806c5	[ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (#41573 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-10 03:47:40 +00:00
Lucas Wilkinson	b1728c1e66	[Attention][Cleanup] Remove tree attention (#42121 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-05-08 18:36:19 -07:00
Kevin H. Luu	0c2e9d4892	[CI] Narrow misc.yaml source dependencies (#42059 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-08 15:10:12 -07:00
Kevin H. Luu	d2f22dfc9f	[CI] Narrow engine.yaml source dependencies (#42055 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-08 14:55:33 -07:00
Kevin H. Luu	f4dd5c116c	[CI] Narrow Platform Tests (CUDA) source dependencies (#42054 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-08 14:54:06 -07:00
Kevin H. Luu	f47ccc8b1c	[CI] Narrow pytorch.yaml compile job source dependencies (#42057 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-05-08 14:43:17 -07:00
liuzhenwei	f2bbd575e2	[CI][XPU] Skip fork-dependent logits processor test (#42013 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-05-08 06:10:19 -07:00
Chaojun Zhang	19df11f5d1	[CI][XPU]Ignore some lora tests from LoRA Intel CI pipeline (#42010 ) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>	2026-05-08 17:34:27 +08:00
haosdent	36b2c79d4b	[CI][Bugfix] Drop duplicated examples/ prefix in tensorize_vllm_model command (#42039 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-05-08 02:23:22 -07:00
wang.yuqi	1d694e78c9	[Examples][last/6] Resettle examples. (#41084 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-05-07 19:42:12 -07:00
Chaojun Zhang	805e9f7b77	[XPU] Fix lora bugs & enable UTs under tests/lora (#38206 ) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>	2026-05-07 05:58:00 -07:00
Li, Jiang	b3945cc316	[CPU] Bump up to the latest CPU kernels (#41924 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-05-07 05:45:59 -07:00
Fadi Arafeh	b20731d0ae	[CI][Arm] skip e2e model tests if HF_TOKEN is not set (#41919 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-05-07 11:31:50 +00:00
Yuwen Zhou	713b28bd0b	[CPU] Add FP8 W8A16 MoE support (#41314 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>	2026-05-06 23:17:07 -07:00
Fadi Arafeh	51f22dcfd0	[Feat][CPU] Enable Gated DeltaNet Attention (Qwen 3.5 / 3.6) (#41025 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>	2026-05-07 12:57:09 +08:00
Micah Williamson	7a576e2c72	[ROCm][CI] Remove `TORCH_NCCL_BLOCKING_WAIT=1` After Bugfix In ROCm 7.2 (#41840 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-05-06 16:37:11 -07:00
Flora Feng	f3f8efa73a	[CI] Enable gemma4 parser test on CI (#41857 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-05-06 20:25:34 +00:00
Nicolò Lucchesi	e43a791284	[Bugfix][CI] Fix Disaggregated test area path (#41794 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-05-06 17:41:24 +08:00
Yuwen Zhou	809b98e5b7	[CPU] Add FP8 W8A16 linear support (#41186 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>	2026-05-06 07:05:27 +00:00
Andreas Karatzas	91740ca5ea	[ROCm][CI] Refine gating tests (#37243 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-05 22:05:20 -07:00

1 2 3 4 5 ...

1373 Commits