obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Vadim Gimpelson	4765f0f189	[Bugfix] Fix `sequence_parallel_chunk_impl` custom op aliasing its input (#44130 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-05 23:56:36 +00:00
Terrence Zhao	a50e675b0d	[Cohere] fix RoutingMethodType (#44021 ) Signed-off-by: Terrencezzj <terrence@cohere.ai>	2026-06-05 16:25:53 -07:00
Daoyuan Li	f6a708ab2b	[Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#44435 ) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>	2026-06-05 16:04:32 -07:00
akii96	4200f62147	[ROCm][GPT-OSS] Fuse RoPE + static Q FP8 quant on fused RoPE+KV path (#42832 ) Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-06-05 16:22:19 -05:00
Walter Beller-Morales	c73b0d0db9	[Core][Engine] allow DP ray placement groups to be set on specific nodes (#44669 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-06-05 20:07:47 +00:00
Harry Mellor	e28e369f78	Male Mergify comment less spammy (#44666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-05 10:56:52 -07:00
yzong-rh	703fb17b13	[Bugfix] GPT-OSS instruction rendering (#44330 ) Signed-off-by: Yifan Zong <yzong@redhat.com>	2026-06-05 13:52:32 -04:00
Sting Lin	b593396c7a	Upgrade tpu-inference to v0.21.0 (#44621 ) Signed-off-by: StingLin <sting.lin@cienet.com>	2026-06-05 16:12:49 +00:00
Flame	91e17d4315	Fix sarvam forward compatibility with transformers v5 (#38804 ) Signed-off-by: vikrantpalle <vikrantpalle@gmail.com>	2026-06-05 11:51:44 -04:00
TJian	aa6fb8a329	[Bugfix] [ROCm] [Critical] fallback to regular abi for ROCm (#44648 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-06-05 15:51:17 +00:00
Effi Ofer	6a894574bf	Add objectstore as a secondary tier to multi-tier kv cache offloading (#41968 ) Signed-off-by: Effi Ofer <effi.ofer@gmail.com>	2026-06-05 18:05:41 +03:00
Yan Ma	7f003a1285	Support MiniCPMV batched preprocessing (#44609 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-06-05 15:05:31 +00:00
Harry Mellor	ef0df7dbd6	[CI] Bump mypy version `1.19.1` -> `1.20.2` (#44647 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-05 14:56:27 +00:00
Harry Mellor	a80af24356	Speed up docs build (#44635 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-05 14:51:44 +00:00
Harry Mellor	c66b19800b	[CI] Bump mistral-common (#44649 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-05 14:18:50 +00:00
rishitdholakia13	6a11d72df7	[Reasoning][Structured Outputs] Add Command A plus tags for structural tags (#44588 ) Signed-off-by: rishitdholakia13 <rishit+github@cohere.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-06-05 06:51:20 -07:00
Woosuk Kwon	02d2da0748	[DSV4] Move more ops out of eager breakpoint (#44561 )	2026-06-05 06:42:41 -07:00
adhithyamulticoreware	bbb6c274c8	[Bugfix] Fix gemma4 crash on CPU: guard mem_get_info call (#44615 ) Signed-off-by: ADHITHYA BALAKRISHNAN <adhithya.balakrishnan@multicorewareinc.com>	2026-06-05 12:47:56 +00:00
Harry Mellor	62215e72c6	Remove KV cache scale boilerplate from model weight loading methods (#43167 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-06-05 05:19:04 -07:00
Tianyu Zhang	7fe7800fa4	[BUG] Fix FP64 Gumbel precision coverage (#43150 ) Signed-off-by: tianyu-z <zhangtianyupro@gmail.com> Signed-off-by: Tianyu Zhang <53099276+tianyu-z@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com>	2026-06-05 19:04:14 +08:00
HueCodes	8a83e6f2d7	[Rust Frontend] Batch auto-abort requests by engine (#44591 ) Signed-off-by: Hugh Ryan <197298026+HueCodes@users.noreply.github.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com>	2026-06-05 02:59:09 -07:00
Chunyang Wen	efc347f1b2	docs: fix tokenizer optimization typo (#44066 ) Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>	2026-06-05 02:12:49 -07:00
Nicolò Lucchesi	d98b8f371c	[NixlConnector] Initiate deprecation cycle for `kv_both` role (#43874 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-06-05 11:08:17 +02:00
Chao-Ju Chen	e64237ae82	[Rust Frontend] Support include_reasoning=false (#44391 ) Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>	2026-06-05 16:47:50 +08:00
XuZhou	d61d8566ec	[Bugfix] Update mistral tokenizer test for continue_final_message fix (#44622 ) Signed-off-by: Xu Zhou <xuzhou9417@163.com> Co-authored-by: Xu Zhou <xuzhou9417@163.com>	2026-06-05 16:13:26 +08:00
Uranus	d2f70da116	fix: pad dummy run query_start_loc (#44603 ) Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>	2026-06-05 00:43:04 -07:00
XuZhou	6542d48964	[Bugfix] Fix test_invocations flaky failure with newer openai SDK (#44618 ) Signed-off-by: Xu Zhou <xuzhou9417@163.com> Co-authored-by: Xu Zhou <xuzhou9417@163.com>	2026-06-05 07:36:20 +00:00
Ting SUN	ca73293fa6	[Bugfix][Rust Frontend] Fix UTF-8 char-boundary panic in incremental detokenizer (#44620 ) Signed-off-by: Ting Sun <suntcrick@gmail.com>	2026-06-05 07:36:17 +00:00
Vic Wen	ef3af56a97	Fix `LLM.wait_for_completion` output type docstring (#44617 ) Signed-off-by: viiccwen <viiccwen@gmail.com>	2026-06-05 00:16:38 -07:00
Tuukka Sarvi	b4a6f26c90	[ROCm][perf] Use workspace manager for sparse indexer allocations (#41002 ) Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com> Co-authored-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-06-04 23:46:29 -07:00
Han Lin	165b7864d0	[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426 ) Signed-off-by: hanlin12 <hanlin12@amd.com> Signed-off-by: Han Lin <hanlin12@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-06-04 23:45:36 -07:00
Li, Jiang	c505cd93ef	[CI/Build] Disable CPU-Compatibility Tests (#44605 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-06-05 13:14:26 +08:00
qizixi	96229fa99e	[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP output plumbing (#43720 ) Signed-off-by: zixi-qi <zixi@inferact.ai>	2026-06-04 22:04:19 -07:00
Luciano Martins	da1daf40bf	[Bugfix] Exclude vision embedder from quantization in Gemma4 Unified (#44571 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2026-06-04 20:47:38 -07:00
Woosuk Kwon	4efd6ffde0	[DSV4] Refactor DeepseekV4Attention (#44569 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-06-04 20:23:07 -07:00
Chris Leonard	56aff0dd15	[10/n] Migrate cuda_view and silu_and_mul_per_block_quant kernels to torch stale ABI. (#44334 )	2026-06-04 20:14:43 -07:00
zofia	063ce98fb7	[XPU][MoE] support block_fp8_moe on xpu (#42139 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>	2026-06-05 08:36:58 +08:00
Bugen Zhao	62d6f06e3d	[Rust Frontend] Skip loading multimodal processor if `--language-model-only` is specified (#44500 ) Signed-off-by: Bugen Zhao <i@bugenzhao.com>	2026-06-04 17:02:54 -07:00
Schwinn Saereesitthipitak	b7c5baf63d	fix: keep DeepSeek V4 RoPE cache on inv_freq device (#43926 ) Signed-off-by: Schwinn Saereesitthipitak <schwinns@nvidia.com> Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com>	2026-06-05 02:30:29 +04:00
Jiangyun Zhu	a55fccfc7c	[mamba] unify KDA conv states into one cache to match 2-state SSM layout (#44539 )	2026-06-04 20:38:05 +02:00
Wentao Ye	41a4829f22	[Logs Refactor] Optimize shutdown logs, easier to follow and consistent (#43707 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-06-04 14:36:32 -04:00
Tushar Jain	38fd2405f3	use split_group for pytorch process group creation (#41980 ) Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com> Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>	2026-06-04 14:36:07 -04:00
Agata Dobrzyniewicz	a947f7a420	[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to XPU (#43307 ) Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>	2026-06-04 14:25:59 -04:00
bnellnm	439203d32c	[Bugfix] Fix test_cutlass_moe.py (#44380 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-06-04 14:18:52 -04:00
Taneem Ibrahim	8d9536a775	[Misc] Add unit tests for pooler head classes (#44471 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-06-04 17:59:25 +00:00
Fadi Arafeh	3da29aa4a5	[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-06-04 16:27:17 +00:00
Divakar Verma	06f94633e7	[ROCm][CI] Add test for Aiter unified attn kernel (#44436 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com>	2026-06-04 16:15:05 +00:00
JianweiZheng	99ef652907	[Bugfix] Reject non-positive values for ParallelConfig int knobs (#44057 ) Signed-off-by: jwzheng96 <jianweizheng@pku.edu.cn> Signed-off-by: JianweiZheng <32029023+jwzheng96@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-06-04 11:46:50 -04:00
Tyler Michael Smith	4cc78c9d5d	[Core] Freeze garbage collector in workers after model initialization (#44363 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-06-04 08:39:04 -07:00
tc-mb	3dbb4e0ace	[Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count mismatches visual embedding count (#44509 ) Signed-off-by: tc-mb <tianchi_cai@icloud.com>	2026-06-04 08:22:30 -07:00

1 2 3 4 5 ...

17325 Commits