obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Harry Mellor	ef0df7dbd6	[CI] Bump mypy version `1.19.1` -> `1.20.2` (#44647 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-06-05 14:56:27 +00:00
Yongye Zhu	b5235fca2e	[DSv4] Adding TRTLLM gen attention kernel (#43827 )	2026-06-04 07:35:09 -07:00
Mohammad Miadh Angkad	158289e0fc	[Docs] Fix MLA prefill backend default docs (#43697 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>	2026-05-27 10:13:22 +00:00
Bugen Zhao	39910f2b25	[Rust Frontend] Move code from `vllm-frontend-rs` (#43283 ) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Eric Curtin <eric.curtin@docker.com> Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Signed-off-by: Will.hou <1205157517@qq.com> Signed-off-by: Will.hou <willamhou@ceresman.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Eric Curtin <eric.curtin@docker.com> Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Co-authored-by: Will.hou <1205157517@qq.com> Co-authored-by: Will.hou <willamhou@ceresman.com> Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history.	2026-05-21 17:21:48 -07:00
Lanze Liu	b2c58ee942	[FlashAttn] Fix supports_kv_cache_dtype() accepting unhandled fp8 kv-cache dtype variants (#42685 ) Signed-off-by: Lanze Liu <lanzetech@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-05-15 15:34:59 -04:00
Aaron Hao	e0a45f1455	[Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-05-15 22:53:06 +08:00
CynicDora	256dbcaabf	[Feature] Support custom callable proposer backend for speculative decoding (#39487 ) Signed-off-by: 524031910363 <hyzhyzsh@sjtu.edu.cn> Signed-off-by: CynicDora <hyzhyzsh@sjtu.edu.cn>	2026-05-13 16:53:01 +00:00
Matthew Bonanni	be5983b874	[Docs] Add non-causal support to attention backend docs (#41643 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-05-04 20:35:15 +00:00
Matthew Bonanni	f3fef12350	[Attention] Abstract the MLA prefill backends and eliminate cuDNN (#32623 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-01 13:36:20 -04:00
sychen52	947138b6c2	Add nvfp4 kv cache support (#40177 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2026-05-01 04:55:16 +00:00
Jiangyun Zhu	e8ee2a78db	[Attention] use diff kv backend for mimo v2 flash (#40045 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-04-24 11:25:55 +00:00
Martin Hickey	3951d3eacd	[MyPy] Enable mypy for `vllm/model_executor/layers/` (#40159 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-04-21 20:15:02 -07:00
Vadim Gimpelson	6d85b36a9f	Revert #38730 and #38791 (#40032 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>	2026-04-21 11:44:11 -04:00
Harry Mellor	fc645f1acc	Add structure to `requirements/` directory (#39024 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-10 13:46:41 -07:00
Yan Ma	ec68d53b2b	Add platform manual_seed_all API (#38468 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-04-10 13:43:50 +08:00
Wentao Ye	aec18492d0	[CI] Fix mypy for `vllm/v1/ops` (#39219 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-09 11:06:34 +08:00
Stefano Castagnetta	6183cae1bd	[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>	2026-04-01 12:08:40 -07:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
Lucas Kabela	e31915063d	[Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-31 01:08:11 +00:00
Andreas Karatzas	43cc5138e5	[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-28 22:08:03 -07:00
Harry Mellor	b3601da6e7	[Mypy] Fix mypy for `vllm/model_executor` (except `vllm/model_executor/layers`) (#37904 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 17:14:01 +00:00
Wentao Ye	45bd5c8e75	[Mypy] Fix mypy for `vllm/config` (#37808 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 14:33:59 +00:00
Wei Zhao	b36adfa349	[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-17 20:09:20 +00:00
Isotr0py	a836524d20	[Chore] Replace all base64 usages with faster pybase64 package (#37290 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-17 14:44:19 +00:00
Kunshang Ji	747b068136	[Hardware] Replace memory related torch.cuda APIs (#37031 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-03-16 10:24:48 +00:00
Dimitrios Bariamis	cc16b24b17	Update Flashinfer to 0.6.6 (#36768 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2026-03-12 13:19:19 -04:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Martin Hickey	7f1f36bf91	[CI] Fix mypy for vllm/reasoning (#35742 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-12 12:21:33 +00:00
Yan Ma	894843eb25	replace `with torch.cuda.device` with `with torch.accelerator.device_index` (#36144 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-11 23:12:57 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Taneem Ibrahim	1aaec59d79	[MISC] fixed tool_parser mypy errors (#35640 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 12:23:12 +00:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Martin Hickey	7560d674c9	[CI] Fix mypy for vllm/device allocator (#35518 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-02 15:53:18 +00:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Taneem Ibrahim	59d7af9c6c	[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-03-01 09:26:44 -05:00
Aaron Hao	2ce6f3cf67	[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171 ) Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-27 13:45:21 -07:00
Taneem Ibrahim	d38cd3dde5	[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-02-20 19:56:33 -08:00
junuxyz	c61a98f529	[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-17 12:22:56 +00:00
Aneesh Puttur	0b5f9b7204	[CI] Enable mypy import following for vllm/v1/kv_offload (#34639 ) Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com>	2026-02-17 09:58:15 +08:00
Lucas Kabela	a3205beffb	[CI] Enable mypy coverage for individual excluded files (#34292 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 07:34:29 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
junuxyz	fa7e0bfacf	[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-11 17:03:48 +00:00
Tyler Michael Smith	c4b9e6778f	[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 15:13:20 -08:00
Michael Goin	5e75a14a66	[Doc] Add DCP support to attention backend doc (#33936 )	2026-02-09 18:33:43 -05:00
Harry Mellor	791a94bed0	Consolidate and fix forbidden import `pre-commit` checks (#33982 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 01:47:41 -08:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Lucas Kabela	726d89720c	[CI] Enable mypy import following for `vllm/spec_decode` (#33282 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-30 06:43:32 +00:00
Harry Mellor	fb946a7f89	Make `mypy` opt-out instead of opt-in (#33205 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-29 09:12:26 +00:00

1 2

82 Commits