349 Commits

Author SHA1 Message Date
Zvi Kons b21443e23c Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 14:47:48 +00:00
Luciano Martins a248b45d05 [Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-06-03 12:01:39 -07:00
Madeesh Kannan 023808c23d [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-06-01 10:11:35 -04:00
ltd0924 b690b2bb67 [Model]Support Step-3.7-Flash (#43859)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-28 17:01:48 -07:00
Harry Mellor 085ac221a3 Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-28 18:29:12 +00:00
MaciejBalaNV 9aa131f944 Add Cosmos3 Reasoner model (#43356)
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-05-28 09:43:55 -07:00
Wentao Ye c02c758ea4 [Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-26 19:56:21 -07:00
Holegots 8737e4a857 [Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
2026-05-23 10:42:20 -07:00
Holegots 7c2ff1f819 [Docs] Fix stale version number in token_embed.md (#43488)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
2026-05-23 10:06:56 -07:00
wang.yuqi 2380bfc210 [Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 01:43:14 -07:00
Terrence Zhao 5774aaed0c [Cohere] Enable Cohere MoE (#43143)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
2026-05-19 19:32:06 -07:00
Wang Yiwen 1c6158083a [Model] Openvla support (#42654)
Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>
2026-05-19 08:17:42 -07:00
wang.yuqi 257af77bc2 [Docs] Reorganize online serving docs. (#41907)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-19 14:43:18 +08:00
Gracie Guo (UX) 9fd8487d2f [Docs] Add SVG images for pooling models. (#42626)
Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-05-18 22:50:38 -07:00
wang.yuqi 75fd68c7a5 [Entrypoints] Split the pooling offline API into PoolingOfflineMixin. (#42267)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-05-15 08:05:57 +00:00
Louie Tsai e30f39c4f1 Update Intel Xeon model list and vLLM Benchmark Suite BKMs (#42607)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
2026-05-15 05:14:03 +00:00
Isotr0py faa4b76afa [Model] Support InternS2 Preview (#42705)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zxy <46674730+CUHKSZzxy@users.noreply.github.com>
2026-05-14 21:30:26 -07:00
Haoqing Wang 5cba6839e6 Document MolmoWeb hf_overrides (#42163)
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>
2026-05-10 23:58:22 -07:00
Isotr0py f396bee56f [DSV4] Add PP support for deepseek-v4 (#41694)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
2026-05-10 15:47:26 +00:00
Abhishek Gupta 27d3bac272 docs: clarify Gemma 4 assistant speculative decoding (#42180)
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
2026-05-09 20:08:44 -07:00
Terrence Zhao a2812becd6 [Models] Cohere Eagle + fix to Cohere MoE (#42078)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 21:46:26 -07:00
Yan Ma 4f6fa6341d [XPU] update supported models on XPU (#41911)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-09 10:44:03 +08:00
wang.yuqi 1d694e78c9 [Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
JackyLiu deb737e323 [Doc] Add ModernBertForSequenceClassification to scoring.md cross-en… (#41832)
Signed-off-by: JLiu4Coding <lzwgre@126.com>
2026-05-06 14:17:56 -07:00
bairongz 0a201b60cf [Model] support Qianfan-OCR model (#40136)
Signed-off-by: bairongz <baiyuu.cs@gmail.com>
Signed-off-by: zhuangbairong <zhuangbairong@baidu.com>
Co-authored-by: zhuangbairong <zhuangbairong@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-05 10:51:25 +00:00
Dong W 7198940b39 [Model] Add Moondream3 model support(only query and caption skills) (#32325)
Signed-off-by: Dong Wang <dongw2019@gmail.com>
2026-05-01 10:06:48 +08:00
Terrence Zhao 91a2d39014 [Models] Cohere MoE (#40817)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
2026-04-29 15:54:54 +00:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
Jiangyun Zhu 7a1eb8ac2e [Model] update for mimo v25 (#41029)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-27 21:52:54 -07:00
Isotr0py c245d35ff4 [Model] Add MiMo-V2.5 support (#40967)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: zjy0516 <zhujiangyun@inferact.ai>
Co-authored-by: yasong <yasong.wang@inferact.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-27 13:26:51 +00:00
Yifan Qiao 4d51588e23 [Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
2026-04-26 18:31:08 -07:00
wang.yuqi 9744b699ba [Deprecate] Deprecate LLM.reward offline api, use LLM.encode instead. (#40688)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-04-24 05:37:50 +00:00
stevenkuang d0009ddb0b [Model] Support Hy3 preview (#40681)
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-04-23 22:08:26 +08:00
philip-essential 123674879e [Model] Add block-local attention and YaRN for local layers to Gemma3 (#39823)
Signed-off-by: Philip Monk <169196560+philip-essential@users.noreply.github.com>
2026-04-21 23:34:50 -07:00
artem-spector d249a9e90e Add Granite 4.1 Vision as built-in multimodal model (#40282)
Signed-off-by: Artem Spector <artems@il.ibm.com>
Signed-off-by: artemspector <artems@il.ibm.com>
Co-authored-by: artemspector <artems@il.ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 05:43:39 -07:00
wang.yuqi d2e2e856ad [Frontend] Remove frontend pooling multi task support. (#37861)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 12:27:44 +00:00
Lxx d886c26d4d [Doc] Fix typos in token_embed pooling documentation (#40266)
Signed-off-by: YifanLi3 <lyfqlx3@gmail.com>
2026-04-19 19:27:32 -07:00
z1ying d0697cc7b6 [Doc] Add Realtime Transcription section to supported_models.md (#39845)
Signed-off-by: Ziying Tao <tzzying@outlook.com>
2026-04-18 03:26:14 +00:00
z1ying bf45e6d0a5 [Doc] Add Gemma 4 to supported models list (#39607)
Signed-off-by: z1ying <tzzying@outlook.com>
Signed-off-by: Ziying Tao <tzzying@outlook.com>
2026-04-17 13:42:52 +08:00
wang.yuqi 4e8c3f1c19 [Frontend][last/5] Improve pooling entrypoints | clean up. (#39675)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-16 07:53:23 -07:00
Abhijit Roy 2cdf86044d Add Jina Embeddings v5 model support (fixes #38633) (#39575)
Signed-off-by: Abhijit <abroy@redhat.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-16 06:37:10 +00:00
Jesus Federico fa6ae31177 feat: rename logit_bias/logit_scale to logit_mean/logit_sigma for affine score calibration (#39530)
Signed-off-by: Jesus Federico <jefp@amazon.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-13 04:43:44 +00:00
Jesus Federico b87575d24b feat: add logit_scale to PoolerConfig for affine score calibration (#39435)
Signed-off-by: Jesus Federico <jefp@amazon.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:21:14 +00:00
wang.yuqi cb5f7501cb [New Model]: jinaai/jina-reranker-v3 (#38800)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-10 15:20:40 +00:00
Peter Nguyen 8d0f908b98 [Model] Implement LoRA support for Qwen3ASRForConditionalGeneration (#37247)
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
2026-04-10 18:34:31 +04:00
PatchyTIS 967146e7bd [model] support FireRedLID (#39290)
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
2026-04-10 08:43:58 +00:00
Kyungmin Lee e7a1387e73 Add EXAONE-4.5 (#39388)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-09 20:53:26 -07:00
Varun Sundar Rabindranath 7b80cd8ac3 [Docs] Add Phi-4-reasoning-vision to supported models + examples (#39232)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2026-04-08 02:02:26 +00:00
bhargav-patel-29 c5e3454e5a [Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-06 16:19:56 +08:00
1096125073 71a9125c67 [New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
2026-04-03 08:26:22 +08:00