Daoyuan Li
|
f6a708ab2b
|
[Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#44435)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
|
2026-06-05 16:04:32 -07:00 |
|
Harry Mellor
|
ef0df7dbd6
|
[CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:56:27 +00:00 |
|
Harry Mellor
|
a80af24356
|
Speed up docs build (#44635)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:51:44 +00:00 |
|
Chunyang Wen
|
efc347f1b2
|
docs: fix tokenizer optimization typo (#44066)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
|
2026-06-05 02:12:49 -07:00 |
|
Nicolò Lucchesi
|
d98b8f371c
|
[NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-05 11:08:17 +02:00 |
|
Fadi Arafeh
|
3da29aa4a5
|
[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-06-04 16:27:17 +00:00 |
|
Zvi Kons
|
b21443e23c
|
Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-06-04 14:47:48 +00:00 |
|
Yongye Zhu
|
b5235fca2e
|
[DSv4] Adding TRTLLM gen attention kernel (#43827)
|
2026-06-04 07:35:09 -07:00 |
|
Harry Mellor
|
f35b557239
|
Add GH token to docs build pre run check (#44534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-04 05:43:49 -07:00 |
|
Oğuzhan KIR
|
f25952e59b
|
[MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
Signed-off-by: oguz <oguzhankir17@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-06-04 10:24:25 +08:00 |
|
Luciano Martins
|
a248b45d05
|
[Model] Add Gemma4 Unified (encoder-free) support (#44429)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2026-06-03 12:01:39 -07:00 |
|
Shanshan Shen
|
0e2b13103b
|
[Doc] Update ViT CUDA graph interfaces (#44388)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-06-03 01:20:59 -07:00 |
|
Daoyuan Li
|
bd98e97557
|
[Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc that references it (#44128)
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
|
2026-06-03 00:22:10 +00:00 |
|
Siddharth Bedekar
|
0917a009d3
|
Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
|
2026-06-02 21:51:21 +00:00 |
|
XiaoZ
|
53fa09d085
|
[Misc] Support local image encoding in benchmarks (#43843)
Signed-off-by: xiaoz <Sukra1@outlook.com>
|
2026-06-02 15:15:06 +00:00 |
|
wang.yuqi
|
b623f7ea95
|
[Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-06-02 06:30:21 -07:00 |
|
Siddharth Bedekar
|
266b9d9c64
|
[Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-06-01 15:37:30 -04:00 |
|
Madeesh Kannan
|
023808c23d
|
[Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-06-01 10:11:35 -04:00 |
|
Isotr0py
|
1fd8bd02a4
|
[Docs] Replace broken video url in examples (#44159)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-06-01 06:01:10 +00:00 |
|
Bugen Zhao
|
50c80d7923
|
[Governance] Add @BugenZhao as Rust frontend code owner (#44047)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-05-30 22:23:54 +08:00 |
|
Xiaoran
|
3becc5db40
|
[ROCm] Add attention sink support to AITer flash attention backend (#43817)
Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>
|
2026-05-30 18:13:18 +08:00 |
|
Ilya Markov
|
4aaba00f92
|
[EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-05-29 18:07:16 +00:00 |
|
Chunyang Wen
|
f191d5630e
|
docs: clarify ITL acronym in optimization docs (#43922)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
|
2026-05-29 07:40:05 -07:00 |
|
Harry Mellor
|
0585b5ba2e
|
Skip docs build if PR doesn't affect docs (#43972)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-29 12:09:52 +00:00 |
|
Kunshang Ji
|
30c6289b8e
|
[XPU] fix xpu install document triton-xpu version (#43947)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-05-29 02:05:12 -07:00 |
|
ltd0924
|
b690b2bb67
|
[Model]Support Step-3.7-Flash (#43859)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-28 17:01:48 -07:00 |
|
Harry Mellor
|
085ac221a3
|
Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-28 18:29:12 +00:00 |
|
MaciejBalaNV
|
9aa131f944
|
Add Cosmos3 Reasoner model (#43356)
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-05-28 09:43:55 -07:00 |
|
Animesh Trivedi
|
bfb9ebc211
|
[Feature] Add support for timed trace replay in vllm bench serve to replay Moonshot and Alibaba workload traces (#39795)
Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
|
2026-05-28 03:31:34 -07:00 |
|
JINO ROHIT
|
e1814f822d
|
minor docs: fix incorrect example path (#43830)
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>
|
2026-05-27 22:58:43 -07:00 |
|
Chunyang Wen
|
49a3510266
|
[Docs] Fix the duplicate doc icon issue (#43546)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
|
2026-05-27 16:09:58 +00:00 |
|
Ashwin Giridharan
|
52a31ccecc
|
[Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-05-27 05:39:49 -07:00 |
|
Mohammad Miadh Angkad
|
158289e0fc
|
[Docs] Fix MLA prefill backend default docs (#43697)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-05-27 10:13:22 +00:00 |
|
Aditya Singh
|
ad464e16c0
|
[Doc] Add Ascend NPU tab to the quickstart installation guide (#43550)
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-27 08:41:29 +00:00 |
|
Wentao Ye
|
c02c758ea4
|
[Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-26 19:56:21 -07:00 |
|
Jee Jee Li
|
6e503868ca
|
[Kernel] Porting fuse_minimax_qk_norm to manual fusion (#43410)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-26 13:16:03 -07:00 |
|
Simon Danielsson
|
d565357a90
|
[Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-26 21:52:30 +08:00 |
|
Thibault Castells
|
5d09f471f4
|
[Misc] Support interleaved custom image benchmark datasets (#43636)
Signed-off-by: ThibaultCastells <thib.castells@icloud.com>
|
2026-05-26 03:37:25 -07:00 |
|
Roy Wang
|
0c942c69d6
|
[Doc] Add section on escalating stalled contributions (#43568)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-05-25 14:11:01 +08:00 |
|
Nguyễn Thế Duy
|
3df1c7c43e
|
[Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-25 13:45:31 +08:00 |
|
wang.yuqi
|
1b26fa361e
|
[Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-25 13:44:39 +08:00 |
|
Dao007forever
|
0902d8e62f
|
[KV Connector] Keep MooncakeStore full hits block-aligned (#43494)
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-23 23:15:03 -07:00 |
|
Holegots
|
8737e4a857
|
[Docs] Fix stale version number in token_classify.md (#43489)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
|
2026-05-23 10:42:20 -07:00 |
|
Holegots
|
7c2ff1f819
|
[Docs] Fix stale version number in token_embed.md (#43488)
Signed-off-by: holegots <ikun3.1415927@gmail.com>
|
2026-05-23 10:06:56 -07:00 |
|
Duncan Moss
|
552bbe6f4e
|
[Attention] Add head_dim=512 support for FlashInfer trtllm attention backend (#38822)
|
2026-05-22 20:27:35 -04:00 |
|
Benjamin Chislett
|
4e2eba28be
|
[Perf] Optimize hidden state extraction logic (#37374)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-22 18:23:08 -04:00 |
|
Yongye Zhu
|
843715739b
|
[Refactor] Extract DeepSeek V4 sparse MLA impl into model folder (#43149)
|
2026-05-22 10:06:31 -07:00 |
|
wang.yuqi
|
2380bfc210
|
[Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-22 01:43:14 -07:00 |
|
Isotr0py
|
ba369b7eb5
|
[CI] Fix dockerfile dependency graph failure for pre-commit (#43378)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-05-22 10:26:05 +08:00 |
|
Nick Hill
|
0f66623b0d
|
[Frontend] Rework fastokens integration (#43168)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-05-21 15:36:58 -07:00 |
|