86 Commits

Author SHA1 Message Date
Nick Hill da107a59e5 [MRV2] Also enable MRV2 for Llama and Mistral dense models (#43458)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
2026-06-02 11:18:46 -07:00
Injae Ryou 165460941f [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1 (#39155)
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
2026-05-27 10:53:32 -05:00
Nicolò Lucchesi fba010dd74 [Bugfix][MRV2] Fix KVCache tensor explicit kernel_block_size dim (#42766)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-05-18 20:25:41 -07:00
Nicolò Lucchesi 69c91d010a [MRv2] Default to MRv1 when a connector is present (#42955)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-18 20:34:16 +08:00
Wentao Ye ae4f59f0ec [Model Runner v2] Oracle for model runner v2 - qwen3 dense model by default [1/N] (#39337)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-05-14 10:02:33 -07:00
Siddharth Bedekar f51f6844f9 [Bugfix][Spec Decode] Wire draft_probs into probabilistic draft_model rejection (#40269) 2026-05-13 21:04:03 -04:00
Wentao Ye 577b9623e6 [Bug] Fix status update address for non-MOE model within external dp mode (#40839)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-04 16:37:16 -07:00
Luka Govedič d58c42e19c [vLLM IR] 2/N fused_add_rms_norm and maybe_inplace overload (#36823)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-05-01 23:41:15 -04:00
wliao2 3abf858443 [Test] Refactor hard coded device string in test files under compile/quantization/models/model_executor folders (#38901)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
2026-04-15 11:02:35 +08:00
Tihomir Elek 8d825b87d6 [Bug] Fix TypeError when hf_config.architectures is None during model loading (#38849)
Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
2026-04-13 12:13:21 +01:00
Luka Govedič 40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
2026-03-31 22:15:05 -04:00
Bvicii bda3eda82d [Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
2026-03-28 06:32:52 -07:00
Simon Mo 1fa1e53a73 Revert "[compile] Initialize passes at VllmBackend init" (#37733) 2026-03-20 21:35:49 -07:00
Angela Yi 12fd17eb51 [compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
2026-03-20 11:40:33 -07:00
Jiayi Yan 6a895197fa [Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-05 17:05:46 +00:00
Yanwen Lin 675ec59aa9 [Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-25 08:36:15 +00:00
Luka Govedič 4d9513537d [CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-04 19:09:03 -05:00
Harry Mellor 61e632aea1 Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-03 17:40:59 +00:00
Wentao Ye 3e440786af [Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-28 20:30:32 +00:00
Cyrus Leung 232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Xingyu Liu 80221e1884 [BugFix]Fix eagle draft_model_config and add tests (#31753)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
2026-01-12 23:09:36 -08:00
Cyrus Leung 583a90e005 [Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 04:53:24 +00:00
Cyrus Leung d1b6fe007f [Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:16:21 -08:00
Xingyu Liu 0eee877f67 [Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
2026-01-02 15:13:15 -08:00
Nick Hill bd877162eb [BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
2026-01-02 23:36:38 +08:00
Cyrus Leung 7e24e5d4d6 [Deprecation] Remove deprecated task, seed and MM settings (#30397)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
Cyrus Leung 5a87d8b9b1 [Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
wang.yuqi 9e77ffca3f [Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Cyrus Leung e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung 27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
wang.yuqi 74c4d80c6c [Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Arpit Khandelwal d7284a2604 [Core] Rename PassConfig flags as per RFC #27995 (#29646)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 03:38:55 +00:00
Harry Mellor 951445a52d Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-02 12:16:37 +00:00
Boyuan Feng 3b221cb661 [BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-02 07:49:16 +00:00
wang.yuqi f4b76056ee Improve enable chunked_prefill & prefix_caching logic. (#26623)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-27 22:05:48 -08:00
Morrison Turnansky 0838b52e2e [Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Harry Mellor a8b70304d6 Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Michael Goin c6873c4e6d [UX] Support nested dicts in hf_overrides (#25727)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-07 11:19:16 +08:00
Harry Mellor d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
wang.yuqi 7f570f1caa [V0 deprecation] Remove unreachable model_config.supported_tasks (#25642)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-25 11:26:31 +00:00
ahao-anyscale c8bde93367 [BUG] Allows for RunAI Streamer and Torch.compile cache to be used together (#24922)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-09-23 18:13:32 -06:00
Harry Mellor 058525b997 Move PoolerConfig from config/__init__.py to config/pooler.py (#25181)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Woosuk Kwon 759ef49b15 Remove V0 Encoder-Decoder Support (#24907)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Harry Mellor c4afdb69cc Move MultiModalConfig from config/__init__.py to config/multimodal.py (#24659)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Harry Mellor f36355abfd Move LoadConfig from config/__init__.py to config/load.py (#24566)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-10 06:14:18 -07:00
wang.yuqi 84cf78acee [Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-11 09:41:37 -07:00
Harry Mellor c49848396d Refactor sliding window configuration to Transformers best practice (#21927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00
Cyrus Leung 86ae693f20 [Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
22quinn 8632e831ba [Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 00:49:18 +00:00
Nicolò Lucchesi 020f58abcd [Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-12 19:40:11 -07:00