205 Commits

Author SHA1 Message Date
Harry Mellor ef0df7dbd6 [CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-05 14:56:27 +00:00
Tianyu Zhang 7fe7800fa4 [BUG] Fix FP64 Gumbel precision coverage (#43150)
Signed-off-by: tianyu-z <zhangtianyupro@gmail.com>
Signed-off-by: Tianyu Zhang <53099276+tianyu-z@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-05 19:04:14 +08:00
Yongye Zhu b5235fca2e [DSv4] Adding TRTLLM gen attention kernel (#43827) 2026-06-04 07:35:09 -07:00
Andreas Karatzas 22c2e87555 [CI] Reverted gitignore changes (#44497)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-04 00:37:44 -07:00
Andreas Karatzas 5e2af28838 [CI] Resolve release V2 docker build after ROCm CI wheels change (#44463)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-03 21:35:40 -07:00
Andreas Karatzas 87954eb50e [ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci-bake script (#36949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-06-02 23:43:07 -07:00
Mohammad Miadh Angkad 158289e0fc [Docs] Fix MLA prefill backend default docs (#43697)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
2026-05-27 10:13:22 +00:00
Bugen Zhao 39910f2b25 [Rust Frontend] Move code from vllm-frontend-rs (#43283)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Signed-off-by: Will.hou <1205157517@qq.com>
Signed-off-by: Will.hou <willamhou@ceresman.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Eric Curtin <eric.curtin@docker.com>
Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Co-authored-by: Will.hou <1205157517@qq.com>
Co-authored-by: Will.hou <willamhou@ceresman.com>

Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history.
2026-05-21 17:21:48 -07:00
ylangtsou 0b59fc45dd Disable build isolation to bypass CUDA related deps for vllm-tpu (#43038)
Signed-off-by: Ylang Tsou <ylangt@google.com>
Co-authored-by: Ylang Tsou <ylangt@google.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-05-21 18:00:52 -04:00
haosdent caf69823d6 [CI] Pin protoc binary in rust-build stages (#43292)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-05-21 03:38:07 -07:00
Lanze Liu b2c58ee942 [FlashAttn] Fix supports_kv_cache_dtype() accepting unhandled fp8 kv-cache dtype variants (#42685)
Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-05-15 15:34:59 -04:00
Aaron Hao e0a45f1455 [Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
2026-05-15 22:53:06 +08:00
CynicDora 256dbcaabf [Feature] Support custom callable proposer backend for speculative decoding (#39487)
Signed-off-by: 524031910363 <hyzhyzsh@sjtu.edu.cn>
Signed-off-by: CynicDora <hyzhyzsh@sjtu.edu.cn>
2026-05-13 16:53:01 +00:00
Michael Goin 184577ae46 [Build] DeepGEMM: trim comments, add integration notes + TODOs (#42429)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-05-12 15:57:58 -07:00
Michael Goin d077622d60 [Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython (#41516)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 10:27:29 -04:00
Matthew Bonanni be5983b874 [Docs] Add non-causal support to attention backend docs (#41643)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-05-04 20:35:15 +00:00
Matthew Bonanni f3fef12350 [Attention] Abstract the MLA prefill backends and eliminate cuDNN (#32623)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-01 13:36:20 -04:00
sychen52 947138b6c2 Add nvfp4 kv cache support (#40177)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2026-05-01 04:55:16 +00:00
Yifan Qiao 4d51588e23 [Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
2026-04-26 18:31:08 -07:00
Jiangyun Zhu e8ee2a78db [Attention] use diff kv backend for mimo v2 flash (#40045)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-04-24 11:25:55 +00:00
Dmitry Tokarev 3041344287 [Misc] Added curl retries in install_python_libraries.sh (#36700)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-24 01:19:30 +00:00
Johnny 7f95a66cbf [NVIDIA] Add sm_110 (Jetson Thor) to CUDA 13.0 build targets (#39233) 2026-04-23 15:42:14 -04:00
Martin Hickey 3951d3eacd [MyPy] Enable mypy for vllm/model_executor/layers/ (#40159)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2026-04-21 20:15:02 -07:00
Vadim Gimpelson 6d85b36a9f Revert #38730 and #38791 (#40032)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
2026-04-21 11:44:11 -04:00
Harry Mellor fc645f1acc Add structure to requirements/ directory (#39024)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-04-10 13:46:41 -07:00
Yan Ma ec68d53b2b Add platform manual_seed_all API (#38468)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2026-04-10 13:43:50 +08:00
Wentao Ye aec18492d0 [CI] Fix mypy for vllm/v1/ops (#39219)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-04-09 11:06:34 +08:00
Michael Goin eb4205fee5 [UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-08 18:56:32 -07:00
Stefano Castagnetta 6183cae1bd [Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
2026-04-01 12:08:40 -07:00
bnellnm 7cf56a59a2 [MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-04-01 09:44:08 -04:00
wliao2 4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
Lucas Kabela e31915063d [Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-03-31 01:08:11 +00:00
Andreas Karatzas 43cc5138e5 [ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-28 22:08:03 -07:00
TJian 58a249bc61 [ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-28 06:07:03 +00:00
Harry Mellor b3601da6e7 [Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 17:14:01 +00:00
Wentao Ye 45bd5c8e75 [Mypy] Fix mypy for vllm/config (#37808)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 14:33:59 +00:00
Wei Zhao b36adfa349 [Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-03-17 20:09:20 +00:00
Isotr0py a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Kunshang Ji 747b068136 [Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
2026-03-16 10:24:48 +00:00
Dimitrios Bariamis cc16b24b17 Update Flashinfer to 0.6.6 (#36768)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
2026-03-12 13:19:19 -04:00
Kunshang Ji 53ec16a705 [Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-12 07:57:47 -07:00
Martin Hickey 7f1f36bf91 [CI] Fix mypy for vllm/reasoning (#35742)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-12 12:21:33 +00:00
Yan Ma 894843eb25 replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2026-03-11 23:12:57 -07:00
Harry Mellor a0f44bb616 Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:05:24 -07:00
Wei Zhao 379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) 2026-03-07 13:51:54 -08:00
Kunshang Ji 66a2209645 [Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-05 10:36:39 +00:00
Taneem Ibrahim 1aaec59d79 [MISC] fixed tool_parser mypy errors (#35640)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-04 12:23:12 +00:00
Kunshang Ji 16d2ad1d38 [Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-04 09:49:47 +00:00
TJian 5dfc5abe94 [ROCm] [Release] Change the package from aiter to amd-aiter (#35198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-02 23:13:39 -08:00
Martin Hickey 7560d674c9 [CI] Fix mypy for vllm/device allocator (#35518)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-02 15:53:18 +00:00