272 Commits

Author SHA1 Message Date
Daoyuan Li f6a708ab2b [Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#44435)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
2026-06-05 16:04:32 -07:00
Harry Mellor a80af24356 Speed up docs build (#44635)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-05 14:51:44 +00:00
Nicolò Lucchesi d98b8f371c [NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-06-05 11:08:17 +02:00
Fadi Arafeh 3da29aa4a5 [DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-06-04 16:27:17 +00:00
Luciano Martins a248b45d05 [Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-06-03 12:01:39 -07:00
Isotr0py 1fd8bd02a4 [Docs] Replace broken video url in examples (#44159)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-01 06:01:10 +00:00
JINO ROHIT e1814f822d minor docs: fix incorrect example path (#43830)
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>
2026-05-27 22:58:43 -07:00
Ashwin Giridharan 52a31ccecc [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-05-27 05:39:49 -07:00
Simon Danielsson d565357a90 [Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-26 21:52:30 +08:00
Dao007forever 0902d8e62f [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-23 23:15:03 -07:00
Benjamin Chislett 4e2eba28be [Perf] Optimize hidden state extraction logic (#37374)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 18:23:08 -04:00
Kebe 19cf334207 [Feature] Support manually enabling the cumem allocator (#33648)
Signed-off-by: Kebe <mail@kebe7jun.com>
2026-05-20 08:58:30 -04:00
Nicolò Lucchesi 40651c0207 [Docs][PD][NIXL] Bidirectional kv-cache transfer (#43097)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-20 09:02:36 +02:00
Dao007forever aed2eb355a [Docs] Fix MooncakeStoreConnector role in disaggregated example (#42994)
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-19 11:14:43 -07:00
wang.yuqi 257af77bc2 [Docs] Reorganize online serving docs. (#41907)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-19 14:43:18 +08:00
Blanc Swan 4a39b4f553 [Model] Add Apertus Tool Parser (#41154)
Signed-off-by: Blanc <swan.blanc@infomaniak.com>
2026-05-18 11:20:04 -04:00
Jee Jee Li 7d5b033782 [LoRA] Support 2D and 3D MoE LoRA adapter at the same time (#42242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-05-18 15:22:26 +08:00
Zhewen Li 36e74c9ea4 [KV Connector] Support disk offloading in MooncakeStoreConnector (#42689)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 13:34:15 -07:00
Cyrus Leung 2676ab1e0b [Deprecation] Remove old locations of get_tokenizer and resolve_hf_chat_template (#35024)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-05-15 00:13:32 -07:00
Michael Goin 8efd508204 [Quantization] Rework quantization_config to use QuantKey and allow for activation override (#41566) 2026-05-13 16:58:32 -04:00
CynicDora 256dbcaabf [Feature] Support custom callable proposer backend for speculative decoding (#39487)
Signed-off-by: 524031910363 <hyzhyzsh@sjtu.edu.cn>
Signed-off-by: CynicDora <hyzhyzsh@sjtu.edu.cn>
2026-05-13 16:53:01 +00:00
Chao Lei ebeb09d822 [KV Transfer] Add MooncakeStoreConnector for KV cache offloading via Mooncake distributed store (#40900)
Signed-off-by: leichao.lc <leichao.lc@antgroup.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: leichao.lc <leichao.lc@antgroup.com>
Co-authored-by: ivanium <yifanqiao@inferact.ai>
Co-authored-by: aoshen524 <aoshen@inferact.ai>
Co-authored-by: Dao007forever <daole@inferact.ai>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: foraxe <1055696449@qq.com>
Co-authored-by: Skywalker-EP <173423846@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-12 16:09:10 -07:00
Nicolò Lucchesi 770e9bd6b3 [Nixl][PD] Lease renewal TTL KV blocks on P (#41383)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-11 09:27:30 +00:00
Abhishek Gupta 27d3bac272 docs: clarify Gemma 4 assistant speculative decoding (#42180)
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com>
2026-05-09 20:08:44 -07:00
Simon Danielsson f9b9bf3bbb [CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm (#41634)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
2026-05-08 07:05:17 +00:00
wang.yuqi 1d694e78c9 [Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
wang.yuqi 51c1ee9b7c [Examples] Resettle Disaggregated examples. (#40759)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 01:20:38 -07:00
Taneem Ibrahim 54dc64d5d3 [Doc] Add Qwen3-30B-A3B-Thinking-2507-FP8 to batch invariance verified models (#41513)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2026-05-03 08:47:55 -04:00
Dong W 7198940b39 [Model] Add Moondream3 model support(only query and caption skills) (#32325)
Signed-off-by: Dong Wang <dongw2019@gmail.com>
2026-05-01 10:06:48 +08:00
Luis 🚀 14043dfecd feat: Enable prompt_embeds Content Part Support in vLLM Chat Completions API (#40720)
Signed-off-by: Luis Robaina <luis@protopia.ai>
Signed-off-by: Luis Robaina 🚀 <luisfabian1545@gmail.com>
Signed-off-by: LuisRobaina <luis@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
2026-05-01 10:05:55 +08:00
Chauncey faab189554 [Feature]: IndexCache support for DSA models (#37735)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-29 15:15:35 -04:00
Walter Beller-Morales 1312f07531 [Feature] add cohere reasoning and tool parsers (#40422)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-04-28 21:07:53 -07:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
wang.yuqi 8d8062d0a7 [Examples] Resettle generate examples. (#36464)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-27 07:48:37 +00:00
Zhanda Zhu 5d5c776444 [Perf] FP8 FlashInfer Attn for ViT (#38065)
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
Co-authored-by: Yubo Gao <ybgao-nvidia@users.noreply.github.com>
2026-04-27 13:44:15 +08:00
labAxiaoming f768b4473e [Docs] Add docs for context extension using the yarn method (#37430)
Signed-off-by: xiaoming <1259730330@qq.com>
Signed-off-by: labAxiaoming <34019940+labAxiaoming@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-04-24 08:26:09 -07:00
xiao ecbe42e991 [Doc] Clarify supported keys for --speculative-config (#40455)
Signed-off-by: Wangxiaoxiaoa <Wangxiaoxiaoa@users.noreply.github.com>
Co-authored-by: Wangxiaoxiaoa <Wangxiaoxiaoa@users.noreply.github.com>
2026-04-22 04:36:17 -07:00
storyicon 6aa057c9d7 [Multimodal] Support custom video metadata for pre-extracted frame sequences (#40133)
Signed-off-by: storyicon <storyicon@foxmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-22 15:50:04 +08:00
Yusuf Mohammad ec5ef0ac73 [Doc] Add Qwen3 AWQ models to documentation (#40034)
Signed-off-by: Yusuf <yusufmohammad@live.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 09:37:41 -04:00
Vasiliy Kuznetsov 38fa87caca mxfp8 online quant move to new frontend (#40152)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
2026-04-20 06:26:12 -07:00
milesial 6d8b80802b [Docs] Fix thinking_token_budget docs (#40316)
Signed-off-by: milesial <milesial@users.noreply.github.com>
2026-04-20 08:09:44 +00:00
Nick Cao 153ba7f0f3 [Refactor] Drop direct dependency on librosa (#39079)
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-18 06:55:38 +00:00
Vasiliy Kuznetsov 5e5afafa21 [Doc] add docs for online quant frontend (#39736)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
2026-04-16 07:52:58 -07:00
Nicolò Lucchesi 3244a2ebf2 [KVConnector][NIXL] Organize NIXL connector into its own directory (#39354)
The number of features supported by the connector has grown substantially
and the `nixl_connector.py` file has accumulated a lot of code. Creates a separate
directory and isolates connector/scheduler code in the hope of improving clarity
and maintainability.

Further refactor of components aimed at improving clarity and simplifying code
will follow soon.

Signed-off-by: NickLucche <nlucches@redhat.com>
2026-04-12 13:10:50 +00:00
Chauncey ecbfbb8d61 [Feature] Add auto-detection for reasoning_config when only reasoning_parser is set (#38214)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-04-10 01:36:26 +00:00
Chauncey cbe7d18096 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-04-01 09:56:45 -07:00
Nicolò Lucchesi 7337ff7f03 [Docs] PD with Nixl compat matrix (#38628)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-31 15:01:21 +00:00
Flora Feng 3e802e8786 [Mypy] Fix adjust_request typing (#38264)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-31 04:21:18 +00:00
Cyrus Leung ba2f0acc2d [Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-25 10:22:54 -07:00
Sungjae Lee 4731884796 [Feature] limit thinking tokens (hard limit) (#20859)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 09:53:07 -07:00