obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Tushar Jain	38fd2405f3	use split_group for pytorch process group creation (#41980 ) Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com> Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>	2026-06-04 14:36:07 -04:00
Ilya Markov	4f423bd5bc	[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-06-04 03:40:34 +00:00
Siddharth Bedekar	0917a009d3	Fix sparse NCCL weight transfer test construction (#44345 ) Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>	2026-06-02 21:51:21 +00:00
Nick Hill	cab5c9a2a9	[Core] Move `max_concurrent_batches` to `VllmConfig` (#44274 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-06-02 08:57:25 -07:00
Siddharth Bedekar	266b9d9c64	[Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096 ) Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com>	2026-06-01 15:37:30 -04:00
Ilya Markov	4aaba00f92	[EPLB] Make async EPLB default (#43219 ) Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-05-29 18:07:16 +00:00
Nick Hill	7e53283b1c	[Core] Cleanup KVConnector handling with PP + fix MRV2 (#43732 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-28 13:12:03 -07:00
Harry Mellor	085ac221a3	Deprecate `JAISLMHeadModel` (#43784 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-05-28 18:29:12 +00:00
Andreas Karatzas	445ded18c1	[ROCm][CI] Extend ROCm quick reduce coverage (#40990 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-05-26 21:57:13 +08:00
Wentao Ye	33d7cbe02c	[Model Runner v2] Force v1 runner for tests (#43233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-05-23 16:37:24 -07:00
Sumanth R Hegde	3cb83c9592	Add `model` to `WeightTransferEngine.__init__` (#42922 ) Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-05-22 17:52:15 -07:00
akii96	bde560ed6e	[ROCm] Add QuickReduce min-size override and codec threshold (#41675 ) Signed-off-by: <>	2026-05-20 17:46:51 -05:00
Aaron Hao	73dd2f33b7	[bug] fix WeightTransferConfig.backend to allow for all strings (#43121 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-05-19 21:01:29 -04:00
tomeras91	f54721bcc3	[Bugfix][MoE] FlashInfer one-sided: workspace union across heterogeneous layers (#42976 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2026-05-19 14:43:04 -04:00
Aaron Hao	e0a45f1455	[Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-05-15 22:53:06 +08:00
bnellnm	d9b4990783	[MoE Refactor] EPLB refactoring for FusedMoE (#41055 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-05-12 14:16:31 -04:00
Yan Ru Pei	bcb9c133ba	feat(kv-events): emit KV cache metadata (#40984 ) Signed-off-by: PeaBrane <yanrpei@gmail.com>	2026-05-12 15:58:48 +00:00
bnellnm	206eaed08d	[MoE Refactor] Move expert map related code into ExpertMapManager class (#41046 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>	2026-05-12 09:18:27 -04:00
sungsoo ha	4f7bde572a	[Kernel] Pack output and LSE in DCP A2A (#41160 )	2026-05-01 09:01:17 -04:00
Rishi Puri	ccfb620c62	Create tests/distributed/test_mnnvl_alltoall.py (#35241 ) Signed-off-by: Rishi Puri <riship@nvidia.com> Signed-off-by: Claude <claude@anthropic.com> Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>	2026-04-29 21:56:56 +00:00
wang.yuqi	a8208e6a81	[Examples] Resettle features examples. (#40995 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-28 00:33:41 -07:00
Sage Moore	62b1bbe470	[EPLB] Remove asyncio infrastructure from Async EPLB (#40730 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-04-24 00:21:15 +00:00
liuzhenwei	4a79262e0f	[UT][Hardware] let torchrun example tests use the default backend (#39879 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2026-04-23 16:22:28 +08:00
Matthew Bonanni	96a85c5750	[Startup][UX] Enable CUDAGraph memory profiling by default (#38284 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-04-21 18:16:59 -04:00
Sage Moore	3173441b0f	[EPLB] Consolidate is_unchanged/is_received_locally into TransferMetadata (#37341 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-04-20 21:12:42 +00:00
Nicolò Lucchesi	304d5ba1a0	[Bugfix][CI] Fix `tests/distributed/test_torchrun_example_moe.py` (#40349 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-04-20 11:05:44 -07:00
Sage Moore	3461c8b027	[EPLB] Refactor Async EPLB synchronization logic (#37601 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-04-20 17:05:41 +00:00
Ilya Markov	50dd4cb427	[EPLB] Add nixl-based eplb communicator (#36276 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com>	2026-04-20 10:24:23 +00:00
Sumanth R Hegde	adf9bb3c57	[CI] Add weight transfer tests to CI (#39821 ) Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-04-16 15:51:45 -04:00
Martin Hickey	cc07dad789	[HMA] [KVEvent] Enable GPU-side KV events for HMA (#37688 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-04-12 10:01:02 +03:00
Jeffrey Wang	ab79863e6c	Remove MQ multi-node tests (#38934 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-04-03 20:00:08 +00:00
Jeffrey Wang	de5e6c44c6	[Feat][Executor] Introduce RayExecutorV2 (#36836 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-04-01 14:34:29 -07:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
Ilya Markov	abdbb68386	[EPLB] Add alternative communication for EPLB weight exchange (#33176 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>	2026-03-31 08:17:12 -04:00
Sage Moore	497e234d38	[EPLB] Cleanup the transfer logic for the various eplb maps (#34520 ) Signed-off-by: Sage Moore <sagmoore@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-27 10:18:46 +01:00
Flora Feng	9040151fe1	[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 11:31:43 +08:00
Sage Moore	c32a58cc2a	[EPLB] Simplify EPLB rearrange by only returning one map (#36267 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-18 20:34:00 -04:00
Isotr0py	a836524d20	[Chore] Replace all base64 usages with faster pybase64 package (#37290 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-17 14:44:19 +00:00
Flora Feng	384dc7f77b	[Refactor] Relocate completion and chat completion tests (#37125 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 11:31:23 +08:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Harry Mellor	5efa206a8c	Fix `ExaoneMoeMTP` test that never ran in Transformers v4 (#36792 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 17:10:23 +00:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Yongye Zhu	86e1060b17	[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-03-05 22:04:44 -08:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Simon Mo	f678c3f61a	[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-03-04 17:05:32 -05:00
sungsoo ha	6cb901093f	[Core] Add All-to-All communication backend for DCP (#34883 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: sungsoo ha <hasungsoo@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:01:57 -05:00
Joe Runde	6f0dd93801	[Core] Remove busy loop from idle buffer readers (#28053 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-04 07:44:20 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Aaron Hao	2ce6f3cf67	[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171 ) Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-27 13:45:21 -07:00
Lucia Fang	0f2f24c8b2	[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism (#35429 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-26 22:08:16 +00:00

1 2 3 4 5 ...

297 Commits