Andreas Karatzas
76ea1d5d2f
[ROCm][CI] Stabilize Granite tool-use and test URL construction ( #43017 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-05-23 12:21:11 +08:00
Benjamin Chislett
4e2eba28be
[Perf] Optimize hidden state extraction logic ( #37374 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 18:23:08 -04:00
Aaron Hao
f34623bf3c
[bug] AsyncScheduler drops first post-resume token after pause_generation + clear_cache ( #42117 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-19 01:06:21 -07:00
Blanc Swan
4a39b4f553
[Model] Add Apertus Tool Parser ( #41154 )
...
Signed-off-by: Blanc <swan.blanc@infomaniak.com >
2026-05-18 11:20:04 -04:00
Soyaazz
990f49bdcb
[MM][CG] Enable encoder Cudagraph for Step3VL ( #42224 )
...
Signed-off-by: JisoLya <523420504@qq.com >
Signed-off-by: Soyaazz <523420504@qq.com >
2026-05-17 20:19:13 -07:00
Aaron Hao
e0a45f1455
[Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors ( #37476 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-05-15 22:53:06 +08:00
Krish Gupta
70c00163ff
[Feature] Add instruction support for score/rerank chat templates ( #42412 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-05-14 09:41:22 +08:00
John Calderon
b3c69595a6
[MM][CG] Support ViT CG for Qwen2-VL ( #41736 )
...
Signed-off-by: John Calderon <jcalderon@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-05-14 01:52:35 +08:00
Shanshan Shen
92def124bc
[MM][Perf][CG] Support ViT full CUDA graph for Qwen3.5 ( #42151 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-05-13 16:00:32 +08:00
yzong-rh
b1687527b8
[Bugfix] Gemma 4 chat template crash with missing tool name and tool id ( #42188 )
...
Signed-off-by: Yifan <yzong@redhat.com >
2026-05-11 03:07:45 +00:00
Wentao Ye
ea0e501bb1
[KV Connector] Remove compat support for pre-v0.12.0 constructor signatures without KVCacheConfig ( #39832 )
...
The v0.12.0 release contained initial support for HMA in KV Connectors. As part
of these changes, a KVCacheConfig argument was added to KV connector
constructors. Backwards compatibility support for out-of-tree connectors was
included in this change, with a very prominent warning. See #25712 and #27887 .
Since the warning has been around for over 5 months, we can safely remove
the support of it.
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-05-09 23:39:46 +00:00
Sumanth R Hegde
e3b65a5ba0
[feat] Add explicit /start_weight_update and /finish_weight_update APIs for weight transfer ( #39212 )
2026-05-08 18:03:33 -07:00
Ethan Feng
4140faa4a5
[Docs] Fix OpenAI batch model argument examples ( #42066 )
...
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com >
2026-05-08 14:02:46 +00:00
haosdent
52458b60a8
[CI][Examples][RLHF] Disable async scheduling in rlhf_async_new_apis ( #42042 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-05-08 04:58:48 -07:00
wang.yuqi
1d694e78c9
[Examples][last/6] Resettle examples. ( #41084 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
tej
8a4888be21
[ROCm] Profiler api support for ROCm MORI toy proxy server in PD Disaggregation ( #40264 )
...
Signed-off-by: Tej Kiran <kiran.tej@amd.com >
2026-05-07 16:58:38 +08:00
wang.yuqi
51c1ee9b7c
[Examples] Resettle Disaggregated examples. ( #40759 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 01:20:38 -07:00
John Calderon
964a4bc2a5
[MM][CG] Support ViT CG for Qwen2.5-VL ( #40830 )
...
Signed-off-by: John Calderon <jcalderon@nvidia.com >
2026-05-02 11:10:14 +08:00
FredericOdermatt
c408fdd663
[Fix] Sync gemma4 chat template from hf ( #39570 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2026-05-02 03:06:54 +00:00
Luis 🚀
14043dfecd
feat: Enable prompt_embeds Content Part Support in vLLM Chat Completions API ( #40720 )
...
Signed-off-by: Luis Robaina <luis@protopia.ai >
Signed-off-by: Luis Robaina 🚀 <luisfabian1545@gmail.com >
Signed-off-by: LuisRobaina <luis@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
2026-05-01 10:05:55 +08:00
snadampal
3179e53135
[P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes ( #32553 )
...
Signed-off-by: Sunita Nadampalli <nadampal@amazon.com >
2026-04-30 10:14:20 +00:00
wang.yuqi
a8208e6a81
[Examples] Resettle features examples. ( #40995 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-28 00:33:41 -07:00
Isotr0py
c2e88a281c
[Bugfix] Fix broken example opeanai client ( #41088 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-04-28 04:43:04 +00:00
wang.yuqi
8d8062d0a7
[Examples] Resettle generate examples. ( #36464 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-27 07:48:37 +00:00
wang.yuqi
9744b699ba
[Deprecate] Deprecate LLM.reward offline api, use LLM.encode instead. ( #40688 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-04-24 05:37:50 +00:00
Srreyansh Sethi
b7a2605020
[Bugfix] Make Attention Backend Auto-Selection Batch-Invariance-Aware ( #40193 )
...
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-23 14:57:03 +00:00
Shanshan Shen
fe57be7809
[MM][CG] Support --enable-vit-cuda-graph option for VLM examples ( #40580 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-04-22 22:46:14 -07:00
Simon Danielsson
ac58e2a170
[Fix][MoRI] Align MoRI-IO message format with P2pNcclConnector and vllm-router ( #39565 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com >
2026-04-23 08:06:31 +09:00
AllenDou
9047288b68
support hotwords for FunASR model ( #39674 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-04-22 02:25:06 -07:00
rasmith
cefa5281a7
[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes ( #39835 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-04-22 09:48:25 +09:00
artem-spector
d249a9e90e
Add Granite 4.1 Vision as built-in multimodal model ( #40282 )
...
Signed-off-by: Artem Spector <artems@il.ibm.com >
Signed-off-by: artemspector <artems@il.ibm.com >
Co-authored-by: artemspector <artems@il.ibm.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-21 05:43:39 -07:00
wang.yuqi
d2e2e856ad
[Frontend] Remove frontend pooling multi task support. ( #37861 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 12:27:44 +00:00
Shanshan Shen
b47840019e
[MM][Misc] Support image+video mixed inputs (per prompt) for VLM examples ( #40335 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-04-21 03:43:25 +00:00
Nick Cao
153ba7f0f3
[Refactor] Drop direct dependency on librosa ( #39079 )
...
Signed-off-by: Nick Cao <ncao@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-04-18 06:55:38 +00:00
Nithin Chalapathi
80b18230e0
[Frontend] Add multimodal support to /inference/v1/generate endpoint ( #38405 )
...
Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com >
Signed-off-by: Nithin Chalapathi <nithinc@berkeley.edu >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-04-17 20:31:56 -07:00
wang.yuqi
8d2cff8140
[Examples] Resettle Observability examples. ( #40123 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-17 03:13:31 -07:00
Martin Hickey
cc07dad789
[HMA] [KVEvent] Enable GPU-side KV events for HMA ( #37688 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2026-04-12 10:01:02 +03:00
wang.yuqi
cb5f7501cb
[New Model]: jinaai/jina-reranker-v3 ( #38800 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-10 15:20:40 +00:00
PatchyTIS
967146e7bd
[model] support FireRedLID ( #39290 )
...
Signed-off-by: PatchouliTaisa <patchychen@tencent.com >
Co-authored-by: PatchouliTaisa <patchychen@tencent.com >
2026-04-10 08:43:58 +00:00
Kyungmin Lee
e7a1387e73
Add EXAONE-4.5 ( #39388 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-09 20:53:26 -07:00
Ben Browning
8477fe427d
[Tool] adjust_request to reasoning parser, and Gemma4 fixes ( #39027 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-04-08 19:04:04 +00:00
Varun Sundar Rabindranath
7b80cd8ac3
[Docs] Add Phi-4-reasoning-vision to supported models + examples ( #39232 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2026-04-08 02:02:26 +00:00
bsliu
c0817e4d39
[Model] Add support for Cheers multimodal model ( #38788 )
...
Signed-off-by: bsliu <1187291748@qq.com >
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn >
2026-04-02 21:01:40 +08:00
Fynn Schmitt-Ulms
fa246d5231
Fix shape comment in extract_hidden_states example ( #38723 )
...
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com >
2026-04-01 07:29:33 -07:00
liuzhenwei
0c63739135
[EPD] update EPD script arguments ( #36742 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2026-03-31 12:02:09 +00:00
Maosheng Liao
aae3e688f8
Fix document of torchrun_example.py ( #31113 )
2026-03-31 10:54:23 +00:00
haosdent
d39b8daf5f
[Feature] Add Qwen3-ForcedAligner support via token classification pooling ( #35367 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-29 00:27:52 +00:00
Matej Rojec
2908094567
Add /v1/chat/completions/batch endpoint for batched chat completions ( #38011 )
...
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com >
2026-03-26 12:13:33 +08:00
Ekagra Ranjan
7b54f60db0
[Cohere] Enable Cohere-Transcribe ( #38120 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-25 16:13:51 -07:00
Cyrus Leung
ba2f0acc2d
[Misc] Reorganize inputs ( #35182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-25 10:22:54 -07:00