264 Commits

Author SHA1 Message Date
Turner Jabbour 0c96dd64fb [ROCm] Bump fastsafetensors to v0.3.2 from PyPI, remove git source build (#43625)
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
2026-06-04 07:30:57 -07:00
Aakar Dwivedi 3fd9d2d357 [CPU][Zen] Route W8A8 and W4A16 linear inference through zentorch on AMD Zen CPUs (#41813)
Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-05-30 14:17:21 -05:00
Li, Jiang 3f6f508e14 [Bugfix][CPU] Remove invalid extra deps (#43977)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-29 22:02:09 +08:00
Mohammad Miadh Angkad a970fb5a1a Fix CuPy runtime deps and restore humming (#43530)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
2026-05-26 05:59:40 -07:00
Michael Goin 10d264a2b9 Revert "[Misc] add humming to dependencies" (#43492) 2026-05-23 14:21:13 -07:00
Li, Jiang 65b7a812a2 [CPU] Experimentally enable Triton and MRV2 (#43225)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-22 01:48:17 -07:00
Nick Hill f2ace1d57d [Frontend][RFC] Rust front-end integration (#40848)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
2026-05-21 12:24:48 +08:00
Chris Leonard 07aeaf9d4d [6/n] Migrate activation kernels, gptq, gguf, non cutlass w8a8 to libtorch stable ABI (continued) (#42663)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
2026-05-20 00:18:12 -07:00
Jinzhen Lin 8200fbe1ac [Misc] add humming to dependencies (#42540)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
2026-05-19 08:36:47 -07:00
Jiangyun Zhu 140dc2ec30 [Bugfix] Install nvidia-cutlass-dsl[cu13] extra on CUDA 13 platforms (#42438)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-05-13 01:57:21 -07:00
pschlan-amd 39dff5ff39 Add VLLM_USE_SPINLOOP_EXT to use more efficient busy polling (#36517)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
2026-05-11 16:11:49 -07:00
lyd1992 100c7b65e7 [Platform] Fix RISC-V platform detection (lscpu parsing + non-NUMA meminfo) (#40427)
Signed-off-by: liuyudong <liuyudong@iscas.ac.cn>
2026-04-24 04:33:05 +00:00
Honglin Cao 9c271f9403 [gRPC] Add standard gRPC health checking (grpc.health.v1) for Kubernetes native probes (#38016)
Signed-off-by: Honglin Cao <Caohonglin317@hotmail.com>
2026-04-22 21:31:00 +00:00
Isotr0py 67eb6083e3 Revert "[Misc] Move pyav and soundfile to common requirements" (#40276)
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-04-21 09:08:06 -07:00
Chinmay-Kulkarni-AMD 87518c3027 [ZenCPU] AMD Zen CPU Backend with supported dtypes via zentorch weekly (#39967)
Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
2026-04-18 06:22:37 +00:00
Isotr0py 617d1c2ff1 [Misc] Move pyav and soundfile to common requirements (#39997)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-04-16 08:52:37 -07:00
Isotr0py 82531edbfb [Refactor] Remove resampy dependency (#39524)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-04-16 08:48:17 -07:00
Yanan Cao edc3648966 [Kernel][Helion] Fix inductor fusion of Helion HOP (#39944)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 04:41:26 -07:00
Matthew Bonanni c77e596e2e [FlashAttention] Don't overwrite flash_attn_interface.py when installing precompiled (#39932)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-04-15 16:43:15 -04:00
Michael Goin eb4205fee5 [UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-08 18:56:32 -07:00
ibifrost 96b5004b71 [KVConnector] Support 3FS KVConnector (#37636)
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com>
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2026-04-07 15:46:00 +00:00
Robert Shaw 968ed02ace [Quantization][Deprecation] Remove Petit NVFP4 (#32694)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-04-05 00:07:45 +00:00
Yanan Cao ecd5443dbc Bump helion dependency from 0.3.2 to 0.3.3 (#38062)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 10:59:33 -07:00
Fadi Arafeh 34d317dcec [CPU][UX][Perf] Enable tcmalloc by default (#37607)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-03-25 20:39:57 +08:00
Isotr0py c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-21 11:36:15 +08:00
Huanxing 6951fcd44f [XPU] Automatically detect target platform as XPU in build. (#37634)
Signed-off-by: huanxing <huanxing.shen@intel.com>
2026-03-20 13:30:15 +08:00
mikaylagawarecki 8b10e4fb31 [1/n] Migrate permute_cols to libtorch stable ABI (#31509)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
2026-03-19 11:27:26 -04:00
elvischenv d61d2b08e9 [Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 (#36229)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 12:09:27 +00:00
Lalithnarayan C 7acaea634c In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:35:35 +00:00
Hari a3e2e250f0 [Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
2026-03-15 19:38:21 +08:00
Isotr0py 6590a3ecda [Frontend] Remove torchcodec from audio dependency (#37061)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-15 05:15:59 +00:00
arlo 8c29042bb9 [Feature] Add InstantTensor weight loader (#36139) 2026-03-14 18:05:23 +01:00
seanmamasde 84868e4793 [Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats (#35109)
Signed-off-by: seanmamasde <seanmamasde@gmail.com>
2026-03-14 08:44:03 -07:00
Yanan Cao 236de72e49 [CI] Pin helion version (#37012)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:25:29 -04:00
Li, Jiang 092ace9e3a [UX] Improve UX of CPU backend (#36968)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-14 09:27:29 +08:00
Simo Lin 572c776bfb build: update smg-grpc-servicer to use vllm extra (#36938)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
2026-03-13 01:31:36 +00:00
Chang Su 507ddbe992 feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-03-10 03:29:59 -07:00
Andrii Skliar 5d199ac8f2 Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster>
Co-authored-by: Andrii <askliar@nvidia.com>
Co-authored-by: root <root@pool0-03748.cm.cluster>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: root <root@pool0-02416.cm.cluster>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: root <root@pool0-04880.cm.cluster>
2026-03-03 23:20:33 -08:00
Lucas Wilkinson 8b5014d3dd [Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-03-01 23:44:57 +00:00
Ma Jian 90805ff464 [CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
2026-02-28 04:35:21 +00:00
Sophie du Couédic 02acd16861 [Benchmarks] Plot benchmark timeline and requests statistics (#35220)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-26 02:17:43 -08:00
Nick Hill 79504027ef [Misc] Bump fastsafetensors version for latest fixes (#34273)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-11 00:30:09 -08:00
emricksini-h 325ab6b0a8 [Feature] OTEL tracing during loading (#31162) 2026-02-05 16:59:28 -08:00
Michael Goin d0cbac5827 [Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install (#32948)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Shengqi Chen <i@harrychen.xyz>
2026-01-23 19:15:17 -08:00
Lucas Wilkinson 889722f3bf [FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-21 22:02:39 -07:00
Yanan Cao 9d1e611f0e [CI] Add Helion as an optional dependency (#32482)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2026-01-19 19:09:56 +00:00
Isotr0py cee7436a26 [Misc] Make scipy as optional audio/benchmark dependency (#32096)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-11 00:18:57 -08:00
TJian 7a05d2dc65 [CI] [ROCm] Fix tests/entrypoints/test_grpc_server.py on ROCm (#31970)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-01-09 12:54:20 +08:00
Chang Su 791b2fc30a [grpc] Support gRPC server entrypoint (#30190)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: njhill <nickhill123@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2026-01-07 23:24:46 -08:00
RickyChen / 陳昭儒 b3a2bdf1ac [Feature] Add offline FastAPI documentation support for air-gapped environments (#30184)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-29 16:22:39 +00:00