2334 Commits

Author SHA1 Message Date
Daoyuan Li f6a708ab2b [Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#44435)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
2026-06-05 16:04:32 -07:00
Harry Mellor ef0df7dbd6 [CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-05 14:56:27 +00:00
Harry Mellor a80af24356 Speed up docs build (#44635)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-05 14:51:44 +00:00
Chunyang Wen efc347f1b2 docs: fix tokenizer optimization typo (#44066)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
2026-06-05 02:12:49 -07:00
Nicolò Lucchesi d98b8f371c [NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-06-05 11:08:17 +02:00
Fadi Arafeh 3da29aa4a5 [DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-06-04 16:27:17 +00:00
Zvi Kons b21443e23c Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 14:47:48 +00:00
Yongye Zhu b5235fca2e [DSv4] Adding TRTLLM gen attention kernel (#43827) 2026-06-04 07:35:09 -07:00
Harry Mellor f35b557239 Add GH token to docs build pre run check (#44534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-06-04 05:43:49 -07:00
Oğuzhan KIR f25952e59b [MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
Signed-off-by: oguz <oguzhankir17@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-04 10:24:25 +08:00
Luciano Martins a248b45d05 [Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
2026-06-03 12:01:39 -07:00
Shanshan Shen 0e2b13103b [Doc] Update ViT CUDA graph interfaces (#44388)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-06-03 01:20:59 -07:00
Daoyuan Li bd98e97557 [Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc that references it (#44128)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
2026-06-03 00:22:10 +00:00
Siddharth Bedekar 0917a009d3 Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
2026-06-02 21:51:21 +00:00
XiaoZ 53fa09d085 [Misc] Support local image encoding in benchmarks (#43843)
Signed-off-by: xiaoz <Sukra1@outlook.com>
2026-06-02 15:15:06 +00:00
wang.yuqi b623f7ea95 [Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-02 06:30:21 -07:00
Siddharth Bedekar 266b9d9c64 [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-01 15:37:30 -04:00
Madeesh Kannan 023808c23d [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-06-01 10:11:35 -04:00
Isotr0py 1fd8bd02a4 [Docs] Replace broken video url in examples (#44159)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-06-01 06:01:10 +00:00
Bugen Zhao 50c80d7923 [Governance] Add @BugenZhao as Rust frontend code owner (#44047)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
2026-05-30 22:23:54 +08:00
Xiaoran 3becc5db40 [ROCm] Add attention sink support to AITer flash attention backend (#43817)
Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>
2026-05-30 18:13:18 +08:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
Chunyang Wen f191d5630e docs: clarify ITL acronym in optimization docs (#43922)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
2026-05-29 07:40:05 -07:00
Harry Mellor 0585b5ba2e Skip docs build if PR doesn't affect docs (#43972)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-29 12:09:52 +00:00
Kunshang Ji 30c6289b8e [XPU] fix xpu install document triton-xpu version (#43947)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-05-29 02:05:12 -07:00
ltd0924 b690b2bb67 [Model]Support Step-3.7-Flash (#43859)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-28 17:01:48 -07:00
Harry Mellor 085ac221a3 Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-28 18:29:12 +00:00
MaciejBalaNV 9aa131f944 Add Cosmos3 Reasoner model (#43356)
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-05-28 09:43:55 -07:00
Animesh Trivedi bfb9ebc211 [Feature] Add support for timed trace replay in vllm bench serve to replay Moonshot and Alibaba workload traces (#39795)
Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
2026-05-28 03:31:34 -07:00
JINO ROHIT e1814f822d minor docs: fix incorrect example path (#43830)
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>
2026-05-27 22:58:43 -07:00
Chunyang Wen 49a3510266 [Docs] Fix the duplicate doc icon issue (#43546)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
2026-05-27 16:09:58 +00:00
Ashwin Giridharan 52a31ccecc [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-05-27 05:39:49 -07:00
Mohammad Miadh Angkad 158289e0fc [Docs] Fix MLA prefill backend default docs (#43697)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
2026-05-27 10:13:22 +00:00
Aditya Singh ad464e16c0 [Doc] Add Ascend NPU tab to the quickstart installation guide (#43550)
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-27 08:41:29 +00:00
Wentao Ye c02c758ea4 [Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-26 19:56:21 -07:00
Jee Jee Li 6e503868ca [Kernel] Porting fuse_minimax_qk_norm to manual fusion (#43410)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-26 13:16:03 -07:00
Simon Danielsson d565357a90 [Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-26 21:52:30 +08:00
Thibault Castells 5d09f471f4 [Misc] Support interleaved custom image benchmark datasets (#43636)
Signed-off-by: ThibaultCastells <thib.castells@icloud.com>
2026-05-26 03:37:25 -07:00
Roy Wang 0c942c69d6 [Doc] Add section on escalating stalled contributions (#43568)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-05-25 14:11:01 +08:00
Nguyễn Thế Duy 3df1c7c43e [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-25 13:45:31 +08:00
wang.yuqi 1b26fa361e [Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-25 13:44:39 +08:00
Dao007forever 0902d8e62f [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-23 23:15:03 -07:00
Holegots 8737e4a857 [Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
2026-05-23 10:42:20 -07:00
Holegots 7c2ff1f819 [Docs] Fix stale version number in token_embed.md (#43488)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
2026-05-23 10:06:56 -07:00
Duncan Moss 552bbe6f4e [Attention] Add head_dim=512 support for FlashInfer trtllm attention backend (#38822) 2026-05-22 20:27:35 -04:00
Benjamin Chislett 4e2eba28be [Perf] Optimize hidden state extraction logic (#37374)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 18:23:08 -04:00
Yongye Zhu 843715739b [Refactor] Extract DeepSeek V4 sparse MLA impl into model folder (#43149) 2026-05-22 10:06:31 -07:00
wang.yuqi 2380bfc210 [Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 01:43:14 -07:00
Isotr0py ba369b7eb5 [CI] Fix dockerfile dependency graph failure for pre-commit (#43378)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-05-22 10:26:05 +08:00
Nick Hill 0f66623b0d [Frontend] Rework fastokens integration (#43168)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-05-21 15:36:58 -07:00