Commit Graph

874 Commits

Author SHA1 Message Date
Andreas Karatzas 76ea1d5d2f [ROCm][CI] Stabilize Granite tool-use and test URL construction (#43017)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-23 12:21:11 +08:00
Benjamin Chislett 4e2eba28be [Perf] Optimize hidden state extraction logic (#37374)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 18:23:08 -04:00
Aaron Hao f34623bf3c [bug] AsyncScheduler drops first post-resume token after pause_generation + clear_cache (#42117)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-19 01:06:21 -07:00
Blanc Swan 4a39b4f553 [Model] Add Apertus Tool Parser (#41154)
Signed-off-by: Blanc <swan.blanc@infomaniak.com>
2026-05-18 11:20:04 -04:00
Soyaazz 990f49bdcb [MM][CG] Enable encoder Cudagraph for Step3VL (#42224)
Signed-off-by: JisoLya <523420504@qq.com>
Signed-off-by: Soyaazz <523420504@qq.com>
2026-05-17 20:19:13 -07:00
Aaron Hao e0a45f1455 [Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
2026-05-15 22:53:06 +08:00
Krish Gupta 70c00163ff [Feature] Add instruction support for score/rerank chat templates (#42412)
Signed-off-by: KrxGu <krishom70@gmail.com>
2026-05-14 09:41:22 +08:00
John Calderon b3c69595a6 [MM][CG] Support ViT CG for Qwen2-VL (#41736)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-05-14 01:52:35 +08:00
Shanshan Shen 92def124bc [MM][Perf][CG] Support ViT full CUDA graph for Qwen3.5 (#42151)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-05-13 16:00:32 +08:00
yzong-rh b1687527b8 [Bugfix] Gemma 4 chat template crash with missing tool name and tool id (#42188)
Signed-off-by: Yifan <yzong@redhat.com>
2026-05-11 03:07:45 +00:00
Wentao Ye ea0e501bb1 [KV Connector] Remove compat support for pre-v0.12.0 constructor signatures without KVCacheConfig (#39832)
The v0.12.0 release contained initial support for HMA in KV Connectors. As part
of these changes, a KVCacheConfig argument was added to KV connector
constructors. Backwards compatibility support for out-of-tree connectors was
included in this change, with a very prominent warning. See #25712 and #27887.

Since the warning has been around for over 5 months, we can safely remove
the support of it.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-09 23:39:46 +00:00
Sumanth R Hegde e3b65a5ba0 [feat] Add explicit /start_weight_update and /finish_weight_update APIs for weight transfer (#39212) 2026-05-08 18:03:33 -07:00
Ethan Feng 4140faa4a5 [Docs] Fix OpenAI batch model argument examples (#42066)
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
2026-05-08 14:02:46 +00:00
haosdent 52458b60a8 [CI][Examples][RLHF] Disable async scheduling in rlhf_async_new_apis (#42042)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-05-08 04:58:48 -07:00
wang.yuqi 1d694e78c9 [Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
tej 8a4888be21 [ROCm] Profiler api support for ROCm MORI toy proxy server in PD Disaggregation (#40264)
Signed-off-by: Tej Kiran <kiran.tej@amd.com>
2026-05-07 16:58:38 +08:00
wang.yuqi 51c1ee9b7c [Examples] Resettle Disaggregated examples. (#40759)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 01:20:38 -07:00
John Calderon 964a4bc2a5 [MM][CG] Support ViT CG for Qwen2.5-VL (#40830)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
2026-05-02 11:10:14 +08:00
FredericOdermatt c408fdd663 [Fix] Sync gemma4 chat template from hf (#39570)
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>
2026-05-02 03:06:54 +00:00
Luis 🚀 14043dfecd feat: Enable prompt_embeds Content Part Support in vLLM Chat Completions API (#40720)
Signed-off-by: Luis Robaina <luis@protopia.ai>
Signed-off-by: Luis Robaina 🚀 <luisfabian1545@gmail.com>
Signed-off-by: LuisRobaina <luis@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
2026-05-01 10:05:55 +08:00
snadampal 3179e53135 [P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes (#32553)
Signed-off-by: Sunita Nadampalli <nadampal@amazon.com>
2026-04-30 10:14:20 +00:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
Isotr0py c2e88a281c [Bugfix] Fix broken example opeanai client (#41088)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-04-28 04:43:04 +00:00
wang.yuqi 8d8062d0a7 [Examples] Resettle generate examples. (#36464)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-27 07:48:37 +00:00
wang.yuqi 9744b699ba [Deprecate] Deprecate LLM.reward offline api, use LLM.encode instead. (#40688)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-04-24 05:37:50 +00:00
Srreyansh Sethi b7a2605020 [Bugfix] Make Attention Backend Auto-Selection Batch-Invariance-Aware (#40193)
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-23 14:57:03 +00:00
Shanshan Shen fe57be7809 [MM][CG] Support --enable-vit-cuda-graph option for VLM examples (#40580)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-04-22 22:46:14 -07:00
Simon Danielsson ac58e2a170 [Fix][MoRI] Align MoRI-IO message format with P2pNcclConnector and vllm-router (#39565)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
2026-04-23 08:06:31 +09:00
AllenDou 9047288b68 support hotwords for FunASR model (#39674)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
2026-04-22 02:25:06 -07:00
rasmith cefa5281a7 [ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes (#39835)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-04-22 09:48:25 +09:00
artem-spector d249a9e90e Add Granite 4.1 Vision as built-in multimodal model (#40282)
Signed-off-by: Artem Spector <artems@il.ibm.com>
Signed-off-by: artemspector <artems@il.ibm.com>
Co-authored-by: artemspector <artems@il.ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 05:43:39 -07:00
wang.yuqi d2e2e856ad [Frontend] Remove frontend pooling multi task support. (#37861)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 12:27:44 +00:00
Shanshan Shen b47840019e [MM][Misc] Support image+video mixed inputs (per prompt) for VLM examples (#40335)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-04-21 03:43:25 +00:00
Nick Cao 153ba7f0f3 [Refactor] Drop direct dependency on librosa (#39079)
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-18 06:55:38 +00:00
Nithin Chalapathi 80b18230e0 [Frontend] Add multimodal support to /inference/v1/generate endpoint (#38405)
Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
Signed-off-by: Nithin Chalapathi <nithinc@berkeley.edu>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-04-17 20:31:56 -07:00
wang.yuqi 8d2cff8140 [Examples] Resettle Observability examples. (#40123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-17 03:13:31 -07:00
Martin Hickey cc07dad789 [HMA] [KVEvent] Enable GPU-side KV events for HMA (#37688)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2026-04-12 10:01:02 +03:00
wang.yuqi cb5f7501cb [New Model]: jinaai/jina-reranker-v3 (#38800)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-10 15:20:40 +00:00
PatchyTIS 967146e7bd [model] support FireRedLID (#39290)
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
2026-04-10 08:43:58 +00:00
Kyungmin Lee e7a1387e73 Add EXAONE-4.5 (#39388)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-09 20:53:26 -07:00
Ben Browning 8477fe427d [Tool] adjust_request to reasoning parser, and Gemma4 fixes (#39027)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-04-08 19:04:04 +00:00
Varun Sundar Rabindranath 7b80cd8ac3 [Docs] Add Phi-4-reasoning-vision to supported models + examples (#39232)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2026-04-08 02:02:26 +00:00
bsliu c0817e4d39 [Model] Add support for Cheers multimodal model (#38788)
Signed-off-by: bsliu <1187291748@qq.com>
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>
2026-04-02 21:01:40 +08:00
Fynn Schmitt-Ulms fa246d5231 Fix shape comment in extract_hidden_states example (#38723)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
2026-04-01 07:29:33 -07:00
liuzhenwei 0c63739135 [EPD] update EPD script arguments (#36742)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-03-31 12:02:09 +00:00
Maosheng Liao aae3e688f8 Fix document of torchrun_example.py (#31113) 2026-03-31 10:54:23 +00:00
haosdent d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-29 00:27:52 +00:00
Matej Rojec 2908094567 Add /v1/chat/completions/batch endpoint for batched chat completions (#38011)
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
2026-03-26 12:13:33 +08:00
Ekagra Ranjan 7b54f60db0 [Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-25 16:13:51 -07:00
Cyrus Leung ba2f0acc2d [Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-25 10:22:54 -07:00