Zvi Kons
|
b21443e23c
|
Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-06-04 14:47:48 +00:00 |
|
Luciano Martins
|
a248b45d05
|
[Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2026-06-03 12:01:39 -07:00 |
|
Madeesh Kannan
|
023808c23d
|
[Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-06-01 10:11:35 -04:00 |
|
ltd0924
|
b690b2bb67
|
[Model]Support Step-3.7-Flash (#43859)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-28 17:01:48 -07:00 |
|
Harry Mellor
|
085ac221a3
|
Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-28 18:29:12 +00:00 |
|
MaciejBalaNV
|
9aa131f944
|
Add Cosmos3 Reasoner model (#43356)
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-05-28 09:43:55 -07:00 |
|
Wentao Ye
|
c02c758ea4
|
[Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-26 19:56:21 -07:00 |
|
Holegots
|
8737e4a857
|
[Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
|
2026-05-23 10:42:20 -07:00 |
|
Holegots
|
7c2ff1f819
|
[Docs] Fix stale version number in token_embed.md (#43488)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
|
2026-05-23 10:06:56 -07:00 |
|
wang.yuqi
|
2380bfc210
|
[Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-22 01:43:14 -07:00 |
|
Terrence Zhao
|
5774aaed0c
|
[Cohere] Enable Cohere MoE (#43143)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
|
2026-05-19 19:32:06 -07:00 |
|
Wang Yiwen
|
1c6158083a
|
[Model] Openvla support (#42654)
Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>
|
2026-05-19 08:17:42 -07:00 |
|
wang.yuqi
|
257af77bc2
|
[Docs] Reorganize online serving docs. (#41907)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-19 14:43:18 +08:00 |
|
Gracie Guo (UX)
|
9fd8487d2f
|
[Docs] Add SVG images for pooling models. (#42626)
Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-05-18 22:50:38 -07:00 |
|
wang.yuqi
|
75fd68c7a5
|
[Entrypoints] Split the pooling offline API into PoolingOfflineMixin. (#42267)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-05-15 08:05:57 +00:00 |
|
Louie Tsai
|
e30f39c4f1
|
Update Intel Xeon model list and vLLM Benchmark Suite BKMs (#42607)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-05-15 05:14:03 +00:00 |
|
Isotr0py
|
faa4b76afa
|
[Model] Support InternS2 Preview (#42705)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zxy <46674730+CUHKSZzxy@users.noreply.github.com>
|
2026-05-14 21:30:26 -07:00 |
|
Haoqing Wang
|
5cba6839e6
|
Document MolmoWeb hf_overrides (#42163)
Signed-off-by: Haoqi Wang <78337154+hqhq1025@users.noreply.github.com>
|
2026-05-10 23:58:22 -07:00 |
|
Isotr0py
|
f396bee56f
|
[DSV4] Add PP support for deepseek-v4 (#41694)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
|
2026-05-10 15:47:26 +00:00 |
|
Abhishek Gupta
|
27d3bac272
|
docs: clarify Gemma 4 assistant speculative decoding (#42180)
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
|
2026-05-09 20:08:44 -07:00 |
|
Terrence Zhao
|
a2812becd6
|
[Models] Cohere Eagle + fix to Cohere MoE (#42078)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-08 21:46:26 -07:00 |
|
Yan Ma
|
4f6fa6341d
|
[XPU] update supported models on XPU (#41911)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-05-09 10:44:03 +08:00 |
|
wang.yuqi
|
1d694e78c9
|
[Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-07 19:42:12 -07:00 |
|
JackyLiu
|
deb737e323
|
[Doc] Add ModernBertForSequenceClassification to scoring.md cross-en… (#41832)
Signed-off-by: JLiu4Coding <lzwgre@126.com>
|
2026-05-06 14:17:56 -07:00 |
|
bairongz
|
0a201b60cf
|
[Model] support Qianfan-OCR model (#40136)
Signed-off-by: bairongz <baiyuu.cs@gmail.com>
Signed-off-by: zhuangbairong <zhuangbairong@baidu.com>
Co-authored-by: zhuangbairong <zhuangbairong@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-05 10:51:25 +00:00 |
|
Dong W
|
7198940b39
|
[Model] Add Moondream3 model support(only query and caption skills) (#32325)
Signed-off-by: Dong Wang <dongw2019@gmail.com>
|
2026-05-01 10:06:48 +08:00 |
|
Terrence Zhao
|
91a2d39014
|
[Models] Cohere MoE (#40817)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
|
2026-04-29 15:54:54 +00:00 |
|
wang.yuqi
|
a8208e6a81
|
[Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-28 00:33:41 -07:00 |
|
Jiangyun Zhu
|
7a1eb8ac2e
|
[Model] update for mimo v25 (#41029)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Copilot <copilot@github.com>
|
2026-04-27 21:52:54 -07:00 |
|
Isotr0py
|
c245d35ff4
|
[Model] Add MiMo-V2.5 support (#40967)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: zjy0516 <zhujiangyun@inferact.ai>
Co-authored-by: yasong <yasong.wang@inferact.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
|
2026-04-27 13:26:51 +00:00 |
|
Yifan Qiao
|
4d51588e23
|
[Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-04-26 18:31:08 -07:00 |
|
wang.yuqi
|
9744b699ba
|
[Deprecate] Deprecate LLM.reward offline api, use LLM.encode instead. (#40688)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-04-24 05:37:50 +00:00 |
|
stevenkuang
|
d0009ddb0b
|
[Model] Support Hy3 preview (#40681)
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-04-23 22:08:26 +08:00 |
|
philip-essential
|
123674879e
|
[Model] Add block-local attention and YaRN for local layers to Gemma3 (#39823)
Signed-off-by: Philip Monk <169196560+philip-essential@users.noreply.github.com>
|
2026-04-21 23:34:50 -07:00 |
|
artem-spector
|
d249a9e90e
|
Add Granite 4.1 Vision as built-in multimodal model (#40282)
Signed-off-by: Artem Spector <artems@il.ibm.com>
Signed-off-by: artemspector <artems@il.ibm.com>
Co-authored-by: artemspector <artems@il.ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-21 05:43:39 -07:00 |
|
wang.yuqi
|
d2e2e856ad
|
[Frontend] Remove frontend pooling multi task support. (#37861)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-21 12:27:44 +00:00 |
|
Lxx
|
d886c26d4d
|
[Doc] Fix typos in token_embed pooling documentation (#40266)
Signed-off-by: YifanLi3 <lyfqlx3@gmail.com>
|
2026-04-19 19:27:32 -07:00 |
|
z1ying
|
d0697cc7b6
|
[Doc] Add Realtime Transcription section to supported_models.md (#39845)
Signed-off-by: Ziying Tao <tzzying@outlook.com>
|
2026-04-18 03:26:14 +00:00 |
|
z1ying
|
bf45e6d0a5
|
[Doc] Add Gemma 4 to supported models list (#39607)
Signed-off-by: z1ying <tzzying@outlook.com>
Signed-off-by: Ziying Tao <tzzying@outlook.com>
|
2026-04-17 13:42:52 +08:00 |
|
wang.yuqi
|
4e8c3f1c19
|
[Frontend][last/5] Improve pooling entrypoints | clean up. (#39675)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-16 07:53:23 -07:00 |
|
Abhijit Roy
|
2cdf86044d
|
Add Jina Embeddings v5 model support (fixes #38633) (#39575)
Signed-off-by: Abhijit <abroy@redhat.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-16 06:37:10 +00:00 |
|
Jesus Federico
|
fa6ae31177
|
feat: rename logit_bias/logit_scale to logit_mean/logit_sigma for affine score calibration (#39530)
Signed-off-by: Jesus Federico <jefp@amazon.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-13 04:43:44 +00:00 |
|
Jesus Federico
|
b87575d24b
|
feat: add logit_scale to PoolerConfig for affine score calibration (#39435)
Signed-off-by: Jesus Federico <jefp@amazon.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-10 17:21:14 +00:00 |
|
wang.yuqi
|
cb5f7501cb
|
[New Model]: jinaai/jina-reranker-v3 (#38800)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-10 15:20:40 +00:00 |
|
Peter Nguyen
|
8d0f908b98
|
[Model] Implement LoRA support for Qwen3ASRForConditionalGeneration (#37247)
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
|
2026-04-10 18:34:31 +04:00 |
|
PatchyTIS
|
967146e7bd
|
[model] support FireRedLID (#39290)
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
|
2026-04-10 08:43:58 +00:00 |
|
Kyungmin Lee
|
e7a1387e73
|
Add EXAONE-4.5 (#39388)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-09 20:53:26 -07:00 |
|
Varun Sundar Rabindranath
|
7b80cd8ac3
|
[Docs] Add Phi-4-reasoning-vision to supported models + examples (#39232)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-08 02:02:26 +00:00 |
|
bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
1096125073
|
71a9125c67
|
[New Model]: add support for telechat3 (#38510)
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn>
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>
|
2026-04-03 08:26:22 +08:00 |
|