297 Commits

Author SHA1 Message Date
Tushar Jain 38fd2405f3 use split_group for pytorch process group creation (#41980)
Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com>
Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>
2026-06-04 14:36:07 -04:00
Ilya Markov 4f423bd5bc [EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-06-04 03:40:34 +00:00
Siddharth Bedekar 0917a009d3 Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
2026-06-02 21:51:21 +00:00
Nick Hill cab5c9a2a9 [Core] Move max_concurrent_batches to VllmConfig (#44274)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-06-02 08:57:25 -07:00
Siddharth Bedekar 266b9d9c64 [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-01 15:37:30 -04:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
Nick Hill 7e53283b1c [Core] Cleanup KVConnector handling with PP + fix MRV2 (#43732)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-28 13:12:03 -07:00
Harry Mellor 085ac221a3 Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-28 18:29:12 +00:00
Andreas Karatzas 445ded18c1 [ROCm][CI] Extend ROCm quick reduce coverage (#40990)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-26 21:57:13 +08:00
Wentao Ye 33d7cbe02c [Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-23 16:37:24 -07:00
Sumanth R Hegde 3cb83c9592 Add model to WeightTransferEngine.__init__ (#42922)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-22 17:52:15 -07:00
akii96 bde560ed6e [ROCm] Add QuickReduce min-size override and codec threshold (#41675)
Signed-off-by: <>
2026-05-20 17:46:51 -05:00
Aaron Hao 73dd2f33b7 [bug] fix WeightTransferConfig.backend to allow for all strings (#43121)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-05-19 21:01:29 -04:00
tomeras91 f54721bcc3 [Bugfix][MoE] FlashInfer one-sided: workspace union across heterogeneous layers (#42976)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2026-05-19 14:43:04 -04:00
Aaron Hao e0a45f1455 [Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
2026-05-15 22:53:06 +08:00
bnellnm d9b4990783 [MoE Refactor] EPLB refactoring for FusedMoE (#41055)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-05-12 14:16:31 -04:00
Yan Ru Pei bcb9c133ba feat(kv-events): emit KV cache metadata (#40984)
Signed-off-by: PeaBrane <yanrpei@gmail.com>
2026-05-12 15:58:48 +00:00
bnellnm 206eaed08d [MoE Refactor] Move expert map related code into ExpertMapManager class (#41046)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
2026-05-12 09:18:27 -04:00
sungsoo ha 4f7bde572a [Kernel] Pack output and LSE in DCP A2A (#41160) 2026-05-01 09:01:17 -04:00
Rishi Puri ccfb620c62 Create tests/distributed/test_mnnvl_alltoall.py (#35241)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
2026-04-29 21:56:56 +00:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
Sage Moore 62b1bbe470 [EPLB] Remove asyncio infrastructure from Async EPLB (#40730)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-04-24 00:21:15 +00:00
liuzhenwei 4a79262e0f [UT][Hardware] let torchrun example tests use the default backend (#39879)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-04-23 16:22:28 +08:00
Matthew Bonanni 96a85c5750 [Startup][UX] Enable CUDAGraph memory profiling by default (#38284)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-04-21 18:16:59 -04:00
Sage Moore 3173441b0f [EPLB] Consolidate is_unchanged/is_received_locally into TransferMetadata (#37341)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-04-20 21:12:42 +00:00
Nicolò Lucchesi 304d5ba1a0 [Bugfix][CI] Fix tests/distributed/test_torchrun_example_moe.py (#40349)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-04-20 11:05:44 -07:00
Sage Moore 3461c8b027 [EPLB] Refactor Async EPLB synchronization logic (#37601)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-04-20 17:05:41 +00:00
Ilya Markov 50dd4cb427 [EPLB] Add nixl-based eplb communicator (#36276)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
2026-04-20 10:24:23 +00:00
Sumanth R Hegde adf9bb3c57 [CI] Add weight transfer tests to CI (#39821)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-04-16 15:51:45 -04:00
Martin Hickey cc07dad789 [HMA] [KVEvent] Enable GPU-side KV events for HMA (#37688)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2026-04-12 10:01:02 +03:00
Jeffrey Wang ab79863e6c Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-03 20:00:08 +00:00
Jeffrey Wang de5e6c44c6 [Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-01 14:34:29 -07:00
wliao2 4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
Ilya Markov abdbb68386 [EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
2026-03-31 08:17:12 -04:00
Sage Moore 497e234d38 [EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-03-27 10:18:46 +01:00
Flora Feng 9040151fe1 [V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 11:31:43 +08:00
Sage Moore c32a58cc2a [EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-03-18 20:34:00 -04:00
Isotr0py a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Flora Feng 384dc7f77b [Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-17 11:31:23 +08:00
Kunshang Ji 53ec16a705 [Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-12 07:57:47 -07:00
Harry Mellor 5efa206a8c Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 17:10:23 +00:00
lif 00b814ba5a [V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
Yongye Zhu 86e1060b17 [Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-03-05 22:04:44 -08:00
Kunshang Ji 66a2209645 [Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-05 10:36:39 +00:00
Simon Mo f678c3f61a [RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-03-04 17:05:32 -05:00
sungsoo ha 6cb901093f [Core] Add All-to-All communication backend for DCP (#34883)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Signed-off-by: sungsoo ha <hasungsoo@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-04 10:01:57 -05:00
Joe Runde 6f0dd93801 [Core] Remove busy loop from idle buffer readers (#28053)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-03-04 07:44:20 +00:00
Itay Alroy dea268336f [1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
2026-02-28 04:46:42 +00:00
Aaron Hao 2ce6f3cf67 [Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-27 13:45:21 -07:00
Lucia Fang 0f2f24c8b2 [Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism (#35429)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-02-26 22:08:16 +00:00