obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Jee Jee Li	559d6710bf	[PERF]MiniMax-M2 gate kernel (#38445 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com> Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>	2026-05-29 18:28:34 -07:00
Wentao Ye	64e1218673	[Perf] Optimize moe permute by pre-allocate buffer, 9~14% kernel performance improvement (#43014 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-05-28 06:18:26 -07:00
Jee Jee Li	ec5de7fa7d	[LoRA] Add one shot triton kernel For MoE LoRA (#42290 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-05-25 19:47:04 -07:00
Jee Jee Li	d4004455d2	[Kernel] Remove NormGateLinear (#43554 ) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>	2026-05-25 09:49:19 +00:00
danisereb	d56285c747	Tuning script and configs for Triton Mamba SSU kernel (#43083 ) Signed-off-by: Banani Ghosh <bg2502@nyu.edu> Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Banani Ghosh <bg2502@nyu.edu>	2026-05-24 20:12:44 +03:00
Benjamin Chislett	4e2eba28be	[Perf] Optimize hidden state extraction logic (#37374 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-05-22 18:23:08 -04:00
Viktor Pus	87a2adcb43	[Misc] Add common random prefix option to structured-output serving benchmark (#41632 ) Signed-off-by: Viktor Pus <viktorpus@tenstorrent.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-05-16 00:44:48 +00:00
lyd1992	f351455f0f	[CPU][RISC-V] Add RVV-optimized attention kernels for RISC-V Vector Extension (#40119 ) Signed-off-by: liuyudong <liuyudong@iscas.ac.cn> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-15 12:08:23 +08:00
Matthew Bonanni	9898f94abe	[Attention] Remove deprecated MLA prefill arguments (#42555 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-05-14 10:34:06 -07:00
Jee Jee Li	0a65d46628	[DSV4] Fuse norm and router for low latency scenario (#41263 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: jeejeelee <jeejeelee@verda-b300-05.datacrunch.io> Co-authored-by: jeejeelee <jeejeelee@verda-b300-05.datacrunch.io> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-14 05:11:02 -07:00
Yongye Zhu	0d2732dd91	[MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 prefill + decode on Blackwell (#41778 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-05-13 23:48:02 -07:00
bnellnm	6427603ae8	[MoE Refactor] Move remaining experts classes to experts directory (#42334 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-05-12 09:19:46 -04:00
Dao007forever	4845aee6b7	[Benchmark] Add --trust-remote-code flag to multi-turn benchmark (#41661 ) Signed-off-by: Dao Le <daole@inferact.ai> Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-05-05 01:00:37 -07:00
snadampal	3179e53135	[P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes (#32553 ) Signed-off-by: Sunita Nadampalli <nadampal@amazon.com>	2026-04-30 10:14:20 +00:00
Zhanda Zhu	5d5c776444	[Perf] FP8 FlashInfer Attn for ViT (#38065 ) Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com> Co-authored-by: Yubo Gao <ybgao-nvidia@users.noreply.github.com>	2026-04-27 13:44:15 +08:00
Ignacio Sica	f88763efc3	[Bugfix] add seq_lens_cpu_upper_bound to CommonAttentionMetadata in mla_runner.py (#40844 ) Signed-off-by: ignaciosica <mignacio.sica@gmail.com>	2026-04-24 23:13:52 +00:00
Jackmin801	079a4cf399	[MoE] Move cutlass moe to fused_moe/experts/ (#40574 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-04-24 06:05:49 +00:00
Yanan Cao	fe5c115ee4	[vLLM IR] Add IR op testing and benchmarking infrastructure (#40167 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Theresa Shan <Theresa.Shan@amd.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 00:23:03 +00:00
Nicolò Lucchesi	8625ec267b	[Misc] Multi-turn benchmark output performance json (#39572 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-04-13 18:15:23 +00:00
Yan Ma	ec68d53b2b	Add platform manual_seed_all API (#38468 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-04-10 13:43:50 +08:00
Maral	2e9034c998	[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892 ) Signed-off-by: maral <maralbahari.98@gmail.com> Signed-off-by: Maral <maralbahari.98@gmail.com>	2026-04-09 08:50:39 +08:00
Jackmin801	a776a48b1c	[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-08 19:23:08 +00:00
Carl Y	3bc2734dd0	[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>	2026-04-03 01:47:04 +00:00
Xin Yang	9bd7231106	Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205 )" (#38778 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-04-01 22:02:32 -07:00
Monishver	c09ad767cd	Feature/silu block quant fusion v1 (#32996 ) Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>	2026-04-01 18:50:43 +00:00
Zhanda Zhu	c75a313824	[Perf] triton bilinear_pos_embed kernel for ViT (#37948 ) Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>	2026-04-01 01:52:02 -07:00
whyiug	58c959a767	[Misc]: clean up non-core lint issues (#37049 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2026-03-28 10:28:16 -04:00
Liwen	171775f306	Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108 ) Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-28 08:27:11 +00:00
Jee Jee Li	2bfbdca23c	[Bugfix] Fix benchmark_fused_collective.py (#38082 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-25 23:51:00 -07:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Kyle Sayers	38364a7e32	[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-23 16:03:29 -04:00
Harry Mellor	572b432913	Stop bench CLI from recursively casting all configs to `dict` (#37559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 14:04:03 +00:00
Wentao Ye	0ef7f79054	[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:18:34 -04:00
Xin Yang	b1169d7be8	[Kernel] Add gpt-oss Router GEMM kernel (#37205 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-18 08:15:56 -07:00
Andrey Talman	68f783a727	[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673 ) Signed-off-by: atalman <atalman@fb.com>	2026-03-17 18:47:59 +00:00
Wei Zhao	a3a51d20e7	[Benchmark] Improvements to attention benchmark script (#37115 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-16 22:22:40 +00:00
Kunshang Ji	747b068136	[Hardware] Replace memory related torch.cuda APIs (#37031 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-03-16 10:24:48 +00:00
Matthew Bonanni	f444c05c32	[Attention] Use FA4 for MLA prefill (#34732 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-12 12:10:17 -04:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Yan Ma	894843eb25	replace `with torch.cuda.device` with `with torch.accelerator.device_index` (#36144 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-11 23:12:57 -07:00
Roberto L. Castro	580864d81e	[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>	2026-03-09 09:50:36 -07:00
Roberto L. Castro	2b28b9b269	[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-09 09:46:57 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Cyrus Leung	792a74b973	[Doc] Improve UX of `--enable-log-requests` (#35723 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-02 08:24:09 -08:00
Wentao Ye	05970c772c	[Refactor] Remove dead code for attention benchmark script (#35418 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 09:53:46 -08:00

1 2 3 4 5 ...

623 Commits