Tushar Jain
|
38fd2405f3
|
use split_group for pytorch process group creation (#41980)
Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com>
Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>
|
2026-06-04 14:36:07 -04:00 |
|
Ilya Markov
|
4f423bd5bc
|
[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-06-04 03:40:34 +00:00 |
|
Siddharth Bedekar
|
0917a009d3
|
Fix sparse NCCL weight transfer test construction (#44345)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
|
2026-06-02 21:51:21 +00:00 |
|
Nick Hill
|
cab5c9a2a9
|
[Core] Move max_concurrent_batches to VllmConfig (#44274)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-06-02 08:57:25 -07:00 |
|
Siddharth Bedekar
|
266b9d9c64
|
[Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-06-01 15:37:30 -04:00 |
|
Ilya Markov
|
4aaba00f92
|
[EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-05-29 18:07:16 +00:00 |
|
Nick Hill
|
7e53283b1c
|
[Core] Cleanup KVConnector handling with PP + fix MRV2 (#43732)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-28 13:12:03 -07:00 |
|
Harry Mellor
|
085ac221a3
|
Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-28 18:29:12 +00:00 |
|
Andreas Karatzas
|
445ded18c1
|
[ROCm][CI] Extend ROCm quick reduce coverage (#40990)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-26 21:57:13 +08:00 |
|
Wentao Ye
|
33d7cbe02c
|
[Model Runner v2] Force v1 runner for tests (#43233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-23 16:37:24 -07:00 |
|
Sumanth R Hegde
|
3cb83c9592
|
Add model to WeightTransferEngine.__init__ (#42922)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-22 17:52:15 -07:00 |
|
akii96
|
bde560ed6e
|
[ROCm] Add QuickReduce min-size override and codec threshold (#41675)
Signed-off-by: <>
|
2026-05-20 17:46:51 -05:00 |
|
Aaron Hao
|
73dd2f33b7
|
[bug] fix WeightTransferConfig.backend to allow for all strings (#43121)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-05-19 21:01:29 -04:00 |
|
tomeras91
|
f54721bcc3
|
[Bugfix][MoE] FlashInfer one-sided: workspace union across heterogeneous layers (#42976)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-05-19 14:43:04 -04:00 |
|
Aaron Hao
|
e0a45f1455
|
[Feat][RL] IPC weight sync optimizations: multigpu support and chunked packed tensors (#37476)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
|
2026-05-15 22:53:06 +08:00 |
|
bnellnm
|
d9b4990783
|
[MoE Refactor] EPLB refactoring for FusedMoE (#41055)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-05-12 14:16:31 -04:00 |
|
Yan Ru Pei
|
bcb9c133ba
|
feat(kv-events): emit KV cache metadata (#40984)
Signed-off-by: PeaBrane <yanrpei@gmail.com>
|
2026-05-12 15:58:48 +00:00 |
|
bnellnm
|
206eaed08d
|
[MoE Refactor] Move expert map related code into ExpertMapManager class (#41046)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2026-05-12 09:18:27 -04:00 |
|
sungsoo ha
|
4f7bde572a
|
[Kernel] Pack output and LSE in DCP A2A (#41160)
|
2026-05-01 09:01:17 -04:00 |
|
Rishi Puri
|
ccfb620c62
|
Create tests/distributed/test_mnnvl_alltoall.py (#35241)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
|
2026-04-29 21:56:56 +00:00 |
|
wang.yuqi
|
a8208e6a81
|
[Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-28 00:33:41 -07:00 |
|
Sage Moore
|
62b1bbe470
|
[EPLB] Remove asyncio infrastructure from Async EPLB (#40730)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-04-24 00:21:15 +00:00 |
|
liuzhenwei
|
4a79262e0f
|
[UT][Hardware] let torchrun example tests use the default backend (#39879)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-04-23 16:22:28 +08:00 |
|
Matthew Bonanni
|
96a85c5750
|
[Startup][UX] Enable CUDAGraph memory profiling by default (#38284)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-04-21 18:16:59 -04:00 |
|
Sage Moore
|
3173441b0f
|
[EPLB] Consolidate is_unchanged/is_received_locally into TransferMetadata (#37341)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-04-20 21:12:42 +00:00 |
|
Nicolò Lucchesi
|
304d5ba1a0
|
[Bugfix][CI] Fix tests/distributed/test_torchrun_example_moe.py (#40349)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-04-20 11:05:44 -07:00 |
|
Sage Moore
|
3461c8b027
|
[EPLB] Refactor Async EPLB synchronization logic (#37601)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-04-20 17:05:41 +00:00 |
|
Ilya Markov
|
50dd4cb427
|
[EPLB] Add nixl-based eplb communicator (#36276)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
|
2026-04-20 10:24:23 +00:00 |
|
Sumanth R Hegde
|
adf9bb3c57
|
[CI] Add weight transfer tests to CI (#39821)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-04-16 15:51:45 -04:00 |
|
Martin Hickey
|
cc07dad789
|
[HMA] [KVEvent] Enable GPU-side KV events for HMA (#37688)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2026-04-12 10:01:02 +03:00 |
|
Jeffrey Wang
|
ab79863e6c
|
Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-03 20:00:08 +00:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
Sage Moore
|
497e234d38
|
[EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-27 10:18:46 +01:00 |
|
Flora Feng
|
9040151fe1
|
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 11:31:43 +08:00 |
|
Sage Moore
|
c32a58cc2a
|
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-18 20:34:00 -04:00 |
|
Isotr0py
|
a836524d20
|
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-17 14:44:19 +00:00 |
|
Flora Feng
|
384dc7f77b
|
[Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 11:31:23 +08:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Harry Mellor
|
5efa206a8c
|
Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 17:10:23 +00:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Yongye Zhu
|
86e1060b17
|
[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-03-05 22:04:44 -08:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Simon Mo
|
f678c3f61a
|
[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2026-03-04 17:05:32 -05:00 |
|
sungsoo ha
|
6cb901093f
|
[Core] Add All-to-All communication backend for DCP (#34883)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Signed-off-by: sungsoo ha <hasungsoo@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:01:57 -05:00 |
|
Joe Runde
|
6f0dd93801
|
[Core] Remove busy loop from idle buffer readers (#28053)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 07:44:20 +00:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|
Aaron Hao
|
2ce6f3cf67
|
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-27 13:45:21 -07:00 |
|
Lucia Fang
|
0f2f24c8b2
|
[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism (#35429)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-26 22:08:16 +00:00 |
|