obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Chao-Ju Chen	e64237ae82	[Rust Frontend] Support include_reasoning=false (#44391 ) Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>	2026-06-05 16:47:50 +08:00
XuZhou	d61d8566ec	[Bugfix] Update mistral tokenizer test for continue_final_message fix (#44622 ) Signed-off-by: Xu Zhou <xuzhou9417@163.com> Co-authored-by: Xu Zhou <xuzhou9417@163.com>	2026-06-05 16:13:26 +08:00
Uranus	d2f70da116	fix: pad dummy run query_start_loc (#44603 ) Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>	2026-06-05 00:43:04 -07:00
XuZhou	6542d48964	[Bugfix] Fix test_invocations flaky failure with newer openai SDK (#44618 ) Signed-off-by: Xu Zhou <xuzhou9417@163.com> Co-authored-by: Xu Zhou <xuzhou9417@163.com>	2026-06-05 07:36:20 +00:00
Ting SUN	ca73293fa6	[Bugfix][Rust Frontend] Fix UTF-8 char-boundary panic in incremental detokenizer (#44620 ) Signed-off-by: Ting Sun <suntcrick@gmail.com>	2026-06-05 07:36:17 +00:00
Vic Wen	ef3af56a97	Fix `LLM.wait_for_completion` output type docstring (#44617 ) Signed-off-by: viiccwen <viiccwen@gmail.com>	2026-06-05 00:16:38 -07:00
Tuukka Sarvi	b4a6f26c90	[ROCm][perf] Use workspace manager for sparse indexer allocations (#41002 ) Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com> Co-authored-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-06-04 23:46:29 -07:00
Han Lin	165b7864d0	[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426 ) Signed-off-by: hanlin12 <hanlin12@amd.com> Signed-off-by: Han Lin <hanlin12@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-06-04 23:45:36 -07:00
Li, Jiang	c505cd93ef	[CI/Build] Disable CPU-Compatibility Tests (#44605 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-06-05 13:14:26 +08:00
qizixi	96229fa99e	[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP output plumbing (#43720 ) Signed-off-by: zixi-qi <zixi@inferact.ai>	2026-06-04 22:04:19 -07:00
Luciano Martins	da1daf40bf	[Bugfix] Exclude vision embedder from quantization in Gemma4 Unified (#44571 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2026-06-04 20:47:38 -07:00
Woosuk Kwon	4efd6ffde0	[DSV4] Refactor DeepseekV4Attention (#44569 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-06-04 20:23:07 -07:00
Chris Leonard	56aff0dd15	[10/n] Migrate cuda_view and silu_and_mul_per_block_quant kernels to torch stale ABI. (#44334 )	2026-06-04 20:14:43 -07:00
zofia	063ce98fb7	[XPU][MoE] support block_fp8_moe on xpu (#42139 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>	2026-06-05 08:36:58 +08:00
Bugen Zhao	62d6f06e3d	[Rust Frontend] Skip loading multimodal processor if `--language-model-only` is specified (#44500 ) Signed-off-by: Bugen Zhao <i@bugenzhao.com>	2026-06-04 17:02:54 -07:00
Schwinn Saereesitthipitak	b7c5baf63d	fix: keep DeepSeek V4 RoPE cache on inv_freq device (#43926 ) Signed-off-by: Schwinn Saereesitthipitak <schwinns@nvidia.com> Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com>	2026-06-05 02:30:29 +04:00
Jiangyun Zhu	a55fccfc7c	[mamba] unify KDA conv states into one cache to match 2-state SSM layout (#44539 )	2026-06-04 20:38:05 +02:00
Wentao Ye	41a4829f22	[Logs Refactor] Optimize shutdown logs, easier to follow and consistent (#43707 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-06-04 14:36:32 -04:00
Tushar Jain	38fd2405f3	use split_group for pytorch process group creation (#41980 ) Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com> Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>	2026-06-04 14:36:07 -04:00
Agata Dobrzyniewicz	a947f7a420	[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to XPU (#43307 ) Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>	2026-06-04 14:25:59 -04:00
bnellnm	439203d32c	[Bugfix] Fix test_cutlass_moe.py (#44380 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-06-04 14:18:52 -04:00
Taneem Ibrahim	8d9536a775	[Misc] Add unit tests for pooler head classes (#44471 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-06-04 17:59:25 +00:00
Fadi Arafeh	3da29aa4a5	[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-06-04 16:27:17 +00:00
Divakar Verma	06f94633e7	[ROCm][CI] Add test for Aiter unified attn kernel (#44436 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com>	2026-06-04 16:15:05 +00:00
JianweiZheng	99ef652907	[Bugfix] Reject non-positive values for ParallelConfig int knobs (#44057 ) Signed-off-by: jwzheng96 <jianweizheng@pku.edu.cn> Signed-off-by: JianweiZheng <32029023+jwzheng96@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-06-04 11:46:50 -04:00
Tyler Michael Smith	4cc78c9d5d	[Core] Freeze garbage collector in workers after model initialization (#44363 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-06-04 08:39:04 -07:00
tc-mb	3dbb4e0ace	[Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count mismatches visual embedding count (#44509 ) Signed-off-by: tc-mb <tianchi_cai@icloud.com>	2026-06-04 08:22:30 -07:00
Zvi Kons	b21443e23c	Add model support for granite speech plus (#43519 ) Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com> Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 14:47:48 +00:00
Michael Goin	06ee2d8433	[Quant] Support compressed-tensors WNA8O8Int linears and WNInt embeddings (#44340 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-06-04 07:40:33 -07:00
Yongye Zhu	b5235fca2e	[DSv4] Adding TRTLLM gen attention kernel (#43827 )	2026-06-04 07:35:09 -07:00
Andreas Karatzas	3e77036768	[ROCm][CI] Specifying time outs for the lm eval models (#44255 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-06-04 22:35:00 +08:00
Andreas Karatzas	6f68ca3e91	[ROCm][CI] Stabilize memory-release in the Hybrid model generation tests (#44046 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-06-04 22:34:24 +08:00
Turner Jabbour	0c96dd64fb	[ROCm] Bump fastsafetensors to v0.3.2 from PyPI, remove git source build (#43625 ) Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>	2026-06-04 07:30:57 -07:00
Nicolò Lucchesi	68f5e565c9	[PD][Nixl] Mamba prefix caching mode support (#42554 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-06-04 06:41:46 -07:00
QiliangCui2023	9354fb1ba5	[Bugfix][Compile] Guard per_token_group_fp8_quant lookup on non-CUDA platforms (#44476 )	2026-06-04 09:31:50 -04:00
Harry Mellor	f35b557239	Add GH token to docs build pre run check (#44534 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-04 05:43:49 -07:00
Dipika Sikka	e68988a248	Refactor CT NVFP4 linear to use a single class (#42443 )	2026-06-04 08:25:08 -04:00
Rui "Garry" Gao	4b87b3e845	[Bugfix] fix EVS for qwen3-vl (#44205 ) Signed-off-by: Rui "Garry" Gao <garrygaogg@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-06-04 11:06:51 +00:00
wangxiyuan	90619351e3	[Attention] Mamba attention module refactor - LINEAR (#43556 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-06-04 18:45:29 +08:00
Jiahan Chang (Cyrus)	d0975a4b50	[perf] Add gemma RMS AR fusion (#42646 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2026-06-04 01:33:59 -07:00
Kevin_Xiong	1bdc60ed53	Fix Kimi-K2.5 FlashInfer ViT metadata (#44493 ) Signed-off-by: Kevin-XiongC <kevin_xiong1997@outlook.com>	2026-06-04 08:14:35 +00:00
Wei Zhao	a6183563b6	[Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache (#43447 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-06-04 00:48:31 -07:00
Andreas Karatzas	22c2e87555	[CI] Reverted gitignore changes (#44497 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-06-04 00:37:44 -07:00
wang.yuqi	d01d0b4646	[Frontend] Consolidate online serving utils. (#44479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-06-04 06:49:31 +00:00
Oxana Korzh	b4b4aaa70e	[Inductor] Fast-path Inductor fallback for vllm::/vllm_aiter:: custom ops (#42129 ) Signed-off-by: Oxana Korzh <okorzh@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 00:03:52 -05:00
Andreas Karatzas	5e2af28838	[CI] Resolve release V2 docker build after ROCm CI wheels change (#44463 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-06-03 21:35:40 -07:00
Ilya Markov	4f423bd5bc	[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-06-04 03:40:34 +00:00
Jie Fang	f0cd590d62	optimize the compressor 128 split cutedsl kernel (#44230 ) Signed-off-by: Jie Fang <jief@nvidia.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>	2026-06-03 20:22:57 -07:00
Wentao Ye	e6018c644a	[Refactor] Remove dead code in tests and parallel_state (#41471 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-06-03 19:32:39 -07:00
Oğuzhan KIR	f25952e59b	[MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759 ) Signed-off-by: oguz <oguzhankir17@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-06-04 10:24:25 +08:00

1 2 3 4 5 ...

17302 Commits