161 Commits

Author SHA1 Message Date
Isotr0py f8e9c56d15 [Multimodal] Automatically select registered video loader for VLM (#44126)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-02 09:09:47 +00:00
Tobias Wasner 4658bf882b [Bugfix] Clear P0 mm sender cache on sleep/pause to fix mm_hash desync (#43001)
Signed-off-by: Tobias Wasner <wasnertobias@gmail.com>
2026-05-22 03:54:29 -07:00
Ranran f3d5360591 [Bugfix][Multimodal] PyAV video backend returns keyframes labeled as targets (#42586)
Signed-off-by: Ranran <hzz5361@psu.edu>
2026-05-14 08:56:59 -07:00
JooHo Lee 16863072ca [Bugfix] Fix scipy audio resampling ratio (#42233)
Signed-off-by: JooHo Lee <BWAAEEEK@users.noreply.github.com>
Co-authored-by: JooHo Lee <BWAAEEEK@users.noreply.github.com>
2026-05-13 18:52:41 +08:00
Moritz Sanft 2c06cf3486 [Bugfix] use served_model_name for multimodal error message (#41003)
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2026-04-27 08:22:35 -07:00
Jaseel Muhammad 6f2c71be8f [Multimodal] Add PyAV video backend for concurrent video decoding (#39986)
Signed-off-by: Jaseel Muhammad <jaseel.muhammad@mbzuai.ac.ae>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-04-21 20:14:57 -07:00
Nick Cao 153ba7f0f3 [Refactor] Drop direct dependency on librosa (#39079)
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-18 06:55:38 +00:00
milesial b0755523dc [Core] Reduce mm scheduler, get_num_embed overhead (#40143)
Signed-off-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-18 11:25:49 +08:00
Juan Pérez de Algaba 58ee614221 (security) Enforce frame limit in VideoMediaIO (#38636)
Signed-off-by: jperezde <jperezde@redhat.com>
2026-04-01 10:23:45 +00:00
Andreas Karatzas 677424c7ac [Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-30 04:58:53 -07:00
aliialsaeedii 7e76af14fa [Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253)
Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>
2026-03-30 10:26:46 +00:00
Isotr0py c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-21 11:36:15 +08:00
Flora Feng e2d1c8b5e8 [Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 05:31:23 +00:00
Yufeng He 918b7890a1 [Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-18 13:40:03 +00:00
Isotr0py a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Isotr0py 143e4dccdf [Misc] Add online audio_in_video test (#36775)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-15 00:14:11 -07:00
Isotr0py e4ae148a78 [Refactor] Modular video loader backend refactoring (#35202)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-06 06:06:59 -08:00
Nicolò Lucchesi f91808ae0d [MM] Allow audio chunking for offline LLM (#34628)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-23 21:04:28 -08:00
Cyrus Leung 392645454b [Refactor] Decouple TimingContext from InputProcessingContext (#35083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-23 14:15:50 +00:00
Russell Bryant 6f3b2047ab [Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: isotr0py <2037008807@qq.com>
2026-02-18 03:53:35 +00:00
Reagan Lee c4df59ad43 Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-08 04:57:16 -08:00
Cyrus Leung 48312e579a [Misc] Make PlaceholderRange.get_num_embeds a method (#34035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-07 05:30:17 +00:00
Cyrus Leung 116880a5a0 [Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-05 20:40:58 +00:00
Cyrus Leung 88c3e114d8 [Refactor] Move MM data parsing outside processor (#33408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-31 16:46:14 +00:00
Cyrus Leung c87eac18f7 [Refactor] Move MM item count validation outside of processor (#33396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-30 09:27:31 +00:00
Cyrus Leung 831453fcef [Chore] Move MediaConnector to vllm.multimodal.media (#33324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-29 16:54:31 +00:00
Cyrus Leung c6e7404cc5 [Multimodal] Simplify MM input definitions (#33331)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-29 13:32:04 +00:00
Cyrus Leung 28459785ff [3/N] Group together media-related code (#32406)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-15 11:52:12 +00:00
Cyrus Leung cbbae38f93 [2/N] Move cache factories to MM registry (#32382)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-15 01:02:30 -08:00
Cyrus Leung 9ea07b41da [1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 15:25:31 +00:00
Cyrus Leung 90db5b31e4 [Refactor] Move top-level dummy data generation to registry (#32310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-14 02:17:46 -08:00
Cyrus Leung 252c011012 [Refactor] Remove MultiModalProfiler (#32254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 15:10:20 +00:00
Jeremy Teboul 657e9c0e18 [Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-09 19:34:51 +00:00
prashanth058 d3235cb503 [Fix] Enable mm_processor_cache with vision LoRA (#31927)
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
2026-01-08 15:31:51 +08:00
vSeamar 6f351548b2 [Frontend] Implement robust video frame recovery for corrupted videos (#29197)
Signed-off-by: cmartinez <cmartinez@roblox.com>
Signed-off-by: vSeamar <cmartinez@roblox.com>
2026-01-07 01:13:24 +00:00
Nikhil G ada6f91d56 Fix RecursionError in MediaWithBytes unpickling (#31191)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
2026-01-06 20:11:26 +00:00
jeremyteboul 97a01308e9 Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2026-01-03 04:31:09 +00:00
Wenqi Glantz 4924ac582c Add hidden dimension validation for multimodal embedding inputs (#30968)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-19 07:59:36 +00:00
Roger Wang f5f51e5931 [Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Wenqi Glantz 84e23d103d additional protection for CVE-2025-62164 (#30649)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
jeremyteboul 4515eb1a0b [Fix] Update lazing loading of video loader backend (#30444)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-11 10:14:57 +00:00
Andreas Karatzas ed7af3178a [ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 05:33:13 +00:00
Cyrus Leung e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung 27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Cyrus Leung 671427efbf [Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
Cyrus Leung b286a311c2 [Chore] Deprecate merge_by_field_config arg (#30035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 17:21:24 +00:00
knlnguyen1802 fc6acc88ca [Bugfix] Missing cached item in the MultiModalReceiverCache (#28525)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Chenguang Zheng <645327136@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-01 10:18:07 -08:00
Cyrus Leung fe3398fab2 [Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 06:25:10 -08:00
Cyrus Leung 34a984274e [Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Mert Unsal c625d7b1c6 [Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-28 16:10:39 -08:00