SeongJun Lee
|
3099de3617
|
[Kernel][MoE] Add GELU_TANH to CPU, CUTLASS, and WNA16 MoE backends (#42027)
Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>
Co-authored-by: lesj0610 <lesj0610@users.noreply.github.com>
|
2026-06-02 17:12:08 -04:00 |
|
Rukhaiya2004
|
689b0eeb9e
|
[HARDWARE][POWER] Enable SHM communicator support for PowerPC (#43754)
Signed-off-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Signed-off-by: Rukhaiya <bibirukhaiya123@gmail.com>
Co-authored-by: Rukhaiya <rukhaiya@c643n08aix1-lp1.pok.stglabs.ibm.com>
Co-authored-by: Akash kaothalkar <61960177+Akashcodes732@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-02 18:06:32 +08:00 |
|
Fadi Arafeh
|
0b25cf4419
|
[CPU][Perf] Enable fused kernels for GDN's gated delta rules (#43534)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-02 08:00:48 +00:00 |
|
wcy
|
98f1279815
|
[CPU][RISC-V] Add missing RVV cpu_types helpers for WNA16 (#42730)
Signed-off-by: wcy <233313160abc@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-06-01 14:56:41 +08:00 |
|
zhao, zhenhui
|
771e1e48b1
|
[CPU] Enable non-divisible GQA for decode workitems in mixed batches (#43032)
Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>
|
2026-05-26 14:15:47 +08:00 |
|
velonica0
|
c68c55d43e
|
[CPU][RISC-V] Add VLEN=256 support to RVV attention kernels (#42943)
Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
Signed-off-by: velonica0 <47554626+velonica0@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-05-21 04:50:49 -07:00 |
|
Yuwen Zhou
|
88a860d754
|
[CPU] Add MXFP4 W4A16 MoE support (#41922)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Yuwen Zhou <yuwen.zhou@intel.com>
|
2026-05-18 03:04:45 -07:00 |
|
Tianmu Li
|
cac81b6eda
|
[CPU Backend] Improve cpu thread utilization (#42666)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-18 03:04:41 -07:00 |
|
Li, Jiang
|
b4601ad43f
|
[CPU] Add fused GDN support for AMX CPU platform (#42707)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-05-18 03:04:36 -07:00 |
|
lyd1992
|
f351455f0f
|
[CPU][RISC-V] Add RVV-optimized attention kernels for RISC-V Vector Extension (#40119)
Signed-off-by: liuyudong <liuyudong@iscas.ac.cn>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-15 12:08:23 +08:00 |
|
Li, Jiang
|
b3945cc316
|
[CPU] Bump up to the latest CPU kernels (#41924)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-05-07 05:45:59 -07:00 |
|
Tianmu Li
|
e87e09a50a
|
[Feat] dnnl build for AVX2 W8A8 Int8 (#41318)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-05-06 15:28:02 +08:00 |
|
Yuwen Zhou
|
809b98e5b7
|
[CPU] Add FP8 W8A16 linear support (#41186)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
|
2026-05-06 07:05:27 +00:00 |
|
Tianmu Li
|
e47c98ef7a
|
[Fix] Add missing stubs from cpu fp8 attention changes (#41387)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-05-06 12:16:27 +08:00 |
|
Akash kaothalkar
|
420b0a5c95
|
[Hardware][Power]Add Power VSX Attention Backend and fix l2 Cache Crash (#40451)
Signed-off-by: Akash Kaothalkar <akashkaothalkar@akashs-mbp.bl1-in.ibm.com>
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akashkaothalkar@akashs-mbp.bl1-in.ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-05-04 20:51:09 -07:00 |
|
Tianmu Li
|
22524f7a92
|
[Feat] CPU fp8 attn for AMX/AVX-512 (#39445)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-04-29 20:43:21 +08:00 |
|
Yifan Qiao
|
4d51588e23
|
[Feat] DeepSeek V4 Rebased (#40860)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: qizixi <zixi@inferact.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Yongye Zhu <yongye@inferact.ai>
Co-authored-by: Simon Mo <simon@inferact.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: qizixi <zixi@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-04-26 18:31:08 -07:00 |
|
almayne
|
2f314bc5e6
|
[CPU] Added faster exp routine for lower precision data types. (#38112)
Signed-off-by: Anna Mayne <anna.mayne@arm.com>
Co-authored-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-04-23 13:14:44 +00:00 |
|
lyd1992
|
04eac6ba24
|
[Bugfix][CPU][RISC-V] Clamp exp() input to prevent NaN (#40428)
Signed-off-by: liuyudong <liuyudong@iscas.ac.cn>
|
2026-04-22 09:38:18 +00:00 |
|
velonica0
|
ec7aafc02a
|
[CPU][RISC-V] Support multiple RVV VLEN targets via compile-time dispatch (#39478)
Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
|
2026-04-20 14:36:59 +08:00 |
|
Li, Jiang
|
d02421a7db
|
[CPU] Refactor CPU affinity and memory management (#39781)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-17 21:01:08 +08:00 |
|
R3hankhan
|
4b7ca37bd4
|
[CPU][IBM Z][Dockefile][Docs] Fix s390x builds for torch 2.11 and update docs for s390x (#39910)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-04-15 22:26:21 -07:00 |
|
Fadi Arafeh
|
445b7093fd
|
[perf][cpu] Accelerate BF16 GELU with LUT impl on Arm CPUs (#37469)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-15 22:26:17 -07:00 |
|
Ganesh R
|
445a2a4d1a
|
feat(cpu): add CPU support for draft model speculative decoding (#32662)
Signed-off-by: R <Ganesh.R@amd.com>
|
2026-04-10 11:49:52 +08:00 |
|
Andrey Talman
|
2111997f96
|
[release 2.11] Update to torch 2.11 (#34644)
|
2026-04-07 18:55:48 -07:00 |
|
Kyle Mylonakis
|
7b9de7c892
|
[Bugfix] Correct mistake in chained comparison in static assert logic (#38699)
Signed-off-by: Kyle Mylonakis <kyle@protopia.ai>
|
2026-04-07 18:24:39 +08:00 |
|
Anton Ivanov
|
abebd9323d
|
[CPU] Replace OMP initialization (#36487)
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
|
2026-04-03 18:42:43 +08:00 |
|
Li, Jiang
|
c6f722b93e
|
[CPU] Support gelu act in cpu_fused_moe (#38770)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-02 14:14:32 +08:00 |
|
Li, Jiang
|
36d7f19897
|
[CPU] Support head_size 512 in cpu_attn (#38676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-04-01 05:42:27 +00:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Li, Jiang
|
352b90c4a4
|
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-24 07:00:20 -07:00 |
|
yassha
|
199f914183
|
fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369)
Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>
|
2026-03-19 17:45:06 +08:00 |
|
typer-J
|
4184653775
|
feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-03-10 21:51:39 -07:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Tianmu Li
|
8e7820131e
|
[Perf] Use dummy M for weight prepacking on x86 (#35890)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
2026-03-05 04:56:49 +00:00 |
|
Ma Jian
|
90805ff464
|
[CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-28 04:35:21 +00:00 |
|
R3hankhan
|
34ce0ffd1f
|
[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-02-24 07:25:39 -08:00 |
|
Li, Jiang
|
05339a7b20
|
[Bugfix][CPU] Fix llama4 inference on CPU (#34321)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-11 19:07:23 +08:00 |
|
R3hankhan
|
d1b837f0ae
|
[CPU] Enable FP16 (Half dtype) support for s390x (#34116)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-11 14:41:42 +08:00 |
|
Nikhil Gupta
|
caad9f1e01
|
[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901)
Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com>
|
2026-02-09 18:04:41 +08:00 |
|
ihb2032
|
5a5c43511a
|
fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052)
Signed-off-by: ihb2032 <hebome@foxmail.com>
Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain>
|
2026-02-09 08:55:41 +00:00 |
|
Gassan Salama
|
1363e3d6d5
|
[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263)
Signed-off-by: Gassan <gassan.salama@arm.com>
|
2026-02-06 15:01:48 +08:00 |
|
R3hankhan
|
ac04dd374f
|
[CPU] Add BF16 Kernel type for s390x (#33788)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-06 04:57:02 +00:00 |
|
R3hankhan
|
4dffc5e044
|
[CPU] Split attention dispatch by head_dim alignment (#32161)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-03 19:37:15 -08:00 |
|
Radu Salavat
|
e69c990c21
|
[Feature][CPU Backend]: Optimize ARM vectorization backend (#30329)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2026-02-02 20:17:56 -08:00 |
|
linhaifeng
|
fedf64332e
|
[Bugfix]: Fix display errors in TORCH_CHECK messages (#32942)
Signed-off-by: linhaifeng <1371675203@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-31 09:48:48 -08:00 |
|
Li, Jiang
|
8311f083bd
|
[Bugfix][CPU] Fix thread num for shared memory communication (#33317)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 03:26:58 -08:00 |
|
Didier Durand
|
31b25f6516
|
[Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 16:52:03 +08:00 |
|
dolpm
|
58a05b0ca1
|
[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-26 16:59:44 -05:00 |
|
Li, Jiang
|
5da4c7d789
|
[CI/Build][CPU] Fix failed pooling tests and macos smoke test (#32907)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-23 10:48:20 +00:00 |
|