Daoyuan Li
f6a708ab2b
[Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models ( #44435 )
...
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com >
2026-06-05 16:04:32 -07:00
Harry Mellor
a80af24356
Speed up docs build ( #44635 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-06-05 14:51:44 +00:00
Nicolò Lucchesi
d98b8f371c
[NixlConnector] Initiate deprecation cycle for kv_both role ( #43874 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-06-05 11:08:17 +02:00
Fadi Arafeh
3da29aa4a5
[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes ( #34894 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-06-04 16:27:17 +00:00
Luciano Martins
a248b45d05
[Model] Add Gemma4 Unified (encoder-free) support ( #44429 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2026-06-03 12:01:39 -07:00
Isotr0py
1fd8bd02a4
[Docs] Replace broken video url in examples ( #44159 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-06-01 06:01:10 +00:00
JINO ROHIT
e1814f822d
minor docs: fix incorrect example path ( #43830 )
...
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com >
2026-05-27 22:58:43 -07:00
Ashwin Giridharan
52a31ccecc
[Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs ( #43401 )
...
Signed-off-by: Ashwin Giridharan <girida@amazon.com >
Signed-off-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-05-27 05:39:49 -07:00
Simon Danielsson
d565357a90
[Docs][ROCm] MoRI-IO Connector Usage Guide ( #43603 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-26 21:52:30 +08:00
Dao007forever
0902d8e62f
[KV Connector] Keep MooncakeStore full hits block-aligned ( #43494 )
...
Signed-off-by: Dao Le <daole@inferact.ai >
Signed-off-by: Dao Le <Dao007forever@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-05-23 23:15:03 -07:00
Benjamin Chislett
4e2eba28be
[Perf] Optimize hidden state extraction logic ( #37374 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 18:23:08 -04:00
Kebe
19cf334207
[Feature] Support manually enabling the cumem allocator ( #33648 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2026-05-20 08:58:30 -04:00
Nicolò Lucchesi
40651c0207
[Docs][PD][NIXL] Bidirectional kv-cache transfer ( #43097 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-05-20 09:02:36 +02:00
Dao007forever
aed2eb355a
[Docs] Fix MooncakeStoreConnector role in disaggregated example ( #42994 )
...
Signed-off-by: Dao Le <Dao007forever@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-05-19 11:14:43 -07:00
wang.yuqi
257af77bc2
[Docs] Reorganize online serving docs. ( #41907 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-05-19 14:43:18 +08:00
Blanc Swan
4a39b4f553
[Model] Add Apertus Tool Parser ( #41154 )
...
Signed-off-by: Blanc <swan.blanc@infomaniak.com >
2026-05-18 11:20:04 -04:00
Jee Jee Li
7d5b033782
[LoRA] Support 2D and 3D MoE LoRA adapter at the same time ( #42242 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-05-18 15:22:26 +08:00
Zhewen Li
36e74c9ea4
[KV Connector] Support disk offloading in MooncakeStoreConnector ( #42689 )
...
Signed-off-by: Zhewen Li <zhewenli@inferact.ai >
Co-authored-by: Zhewen Li <zhewenli@inferact.ai >
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-16 13:34:15 -07:00
Cyrus Leung
2676ab1e0b
[Deprecation] Remove old locations of get_tokenizer and resolve_hf_chat_template ( #35024 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-05-15 00:13:32 -07:00
Michael Goin
8efd508204
[Quantization] Rework quantization_config to use QuantKey and allow for activation override ( #41566 )
2026-05-13 16:58:32 -04:00
CynicDora
256dbcaabf
[Feature] Support custom callable proposer backend for speculative decoding ( #39487 )
...
Signed-off-by: 524031910363 <hyzhyzsh@sjtu.edu.cn >
Signed-off-by: CynicDora <hyzhyzsh@sjtu.edu.cn >
2026-05-13 16:53:01 +00:00
Chao Lei
ebeb09d822
[KV Transfer] Add MooncakeStoreConnector for KV cache offloading via Mooncake distributed store ( #40900 )
...
Signed-off-by: leichao.lc <leichao.lc@antgroup.com >
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai >
Co-authored-by: leichao.lc <leichao.lc@antgroup.com >
Co-authored-by: ivanium <yifanqiao@inferact.ai >
Co-authored-by: aoshen524 <aoshen@inferact.ai >
Co-authored-by: Dao007forever <daole@inferact.ai >
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com >
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com >
Co-authored-by: foraxe <1055696449@qq.com >
Co-authored-by: Skywalker-EP <173423846@qq.com >
Co-authored-by: fems14 <1804143737@qq.com >
Co-authored-by: jianzs <zheng.shoujian@outlook.com >
Co-authored-by: baxingpiaochong <771405853@qq.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-12 16:09:10 -07:00
Nicolò Lucchesi
770e9bd6b3
[Nixl][PD] Lease renewal TTL KV blocks on P ( #41383 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-05-11 09:27:30 +00:00
Abhishek Gupta
27d3bac272
docs: clarify Gemma 4 assistant speculative decoding ( #42180 )
...
Signed-off-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com >
Co-authored-by: AbhiOnGithub <abhiOnGithub@users.noreply.github.com >
2026-05-09 20:08:44 -07:00
Simon Danielsson
f9b9bf3bbb
[CI][ROCm] Ship RIXL with vllm/vllm-openai-rocm ( #41634 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2026-05-08 07:05:17 +00:00
wang.yuqi
1d694e78c9
[Examples][last/6] Resettle examples. ( #41084 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
wang.yuqi
51c1ee9b7c
[Examples] Resettle Disaggregated examples. ( #40759 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 01:20:38 -07:00
Taneem Ibrahim
54dc64d5d3
[Doc] Add Qwen3-30B-A3B-Thinking-2507-FP8 to batch invariance verified models ( #41513 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-05-03 08:47:55 -04:00
Dong W
7198940b39
[Model] Add Moondream3 model support(only query and caption skills) ( #32325 )
...
Signed-off-by: Dong Wang <dongw2019@gmail.com >
2026-05-01 10:06:48 +08:00
Luis 🚀
14043dfecd
feat: Enable prompt_embeds Content Part Support in vLLM Chat Completions API ( #40720 )
...
Signed-off-by: Luis Robaina <luis@protopia.ai >
Signed-off-by: Luis Robaina 🚀 <luisfabian1545@gmail.com >
Signed-off-by: LuisRobaina <luis@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
2026-05-01 10:05:55 +08:00
Chauncey
faab189554
[Feature]: IndexCache support for DSA models ( #37735 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-29 15:15:35 -04:00
Walter Beller-Morales
1312f07531
[Feature] add cohere reasoning and tool parsers ( #40422 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-04-28 21:07:53 -07:00
wang.yuqi
a8208e6a81
[Examples] Resettle features examples. ( #40995 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-28 00:33:41 -07:00
wang.yuqi
8d8062d0a7
[Examples] Resettle generate examples. ( #36464 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-27 07:48:37 +00:00
Zhanda Zhu
5d5c776444
[Perf] FP8 FlashInfer Attn for ViT ( #38065 )
...
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com >
Co-authored-by: Yubo Gao <ybgao-nvidia@users.noreply.github.com >
2026-04-27 13:44:15 +08:00
labAxiaoming
f768b4473e
[Docs] Add docs for context extension using the yarn method ( #37430 )
...
Signed-off-by: xiaoming <1259730330@qq.com >
Signed-off-by: labAxiaoming <34019940+labAxiaoming@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-04-24 08:26:09 -07:00
xiao
ecbe42e991
[Doc] Clarify supported keys for --speculative-config ( #40455 )
...
Signed-off-by: Wangxiaoxiaoa <Wangxiaoxiaoa@users.noreply.github.com >
Co-authored-by: Wangxiaoxiaoa <Wangxiaoxiaoa@users.noreply.github.com >
2026-04-22 04:36:17 -07:00
storyicon
6aa057c9d7
[Multimodal] Support custom video metadata for pre-extracted frame sequences ( #40133 )
...
Signed-off-by: storyicon <storyicon@foxmail.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-22 15:50:04 +08:00
Yusuf Mohammad
ec5ef0ac73
[Doc] Add Qwen3 AWQ models to documentation ( #40034 )
...
Signed-off-by: Yusuf <yusufmohammad@live.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-21 09:37:41 -04:00
Vasiliy Kuznetsov
38fa87caca
mxfp8 online quant move to new frontend ( #40152 )
...
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com >
2026-04-20 06:26:12 -07:00
milesial
6d8b80802b
[Docs] Fix thinking_token_budget docs ( #40316 )
...
Signed-off-by: milesial <milesial@users.noreply.github.com >
2026-04-20 08:09:44 +00:00
Nick Cao
153ba7f0f3
[Refactor] Drop direct dependency on librosa ( #39079 )
...
Signed-off-by: Nick Cao <ncao@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-04-18 06:55:38 +00:00
Vasiliy Kuznetsov
5e5afafa21
[Doc] add docs for online quant frontend ( #39736 )
...
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com >
2026-04-16 07:52:58 -07:00
Nicolò Lucchesi
3244a2ebf2
[KVConnector][NIXL] Organize NIXL connector into its own directory ( #39354 )
...
The number of features supported by the connector has grown substantially
and the `nixl_connector.py` file has accumulated a lot of code. Creates a separate
directory and isolates connector/scheduler code in the hope of improving clarity
and maintainability.
Further refactor of components aimed at improving clarity and simplifying code
will follow soon.
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-04-12 13:10:50 +00:00
Chauncey
ecbfbb8d61
[Feature] Add auto-detection for reasoning_config when only reasoning_parser is set ( #38214 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-04-10 01:36:26 +00:00
Chauncey
cbe7d18096
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str ( #38242 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-04-01 09:56:45 -07:00
Nicolò Lucchesi
7337ff7f03
[Docs] PD with Nixl compat matrix ( #38628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-31 15:01:21 +00:00
Flora Feng
3e802e8786
[Mypy] Fix adjust_request typing ( #38264 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-31 04:21:18 +00:00
Cyrus Leung
ba2f0acc2d
[Misc] Reorganize inputs ( #35182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-25 10:22:54 -07:00
Sungjae Lee
4731884796
[Feature] limit thinking tokens (hard limit) ( #20859 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com >
Signed-off-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-24 09:53:07 -07:00