ENH: Re-enable VXE from build targets for sin/cos #27665
Conversation
| "SIMD in general."); | ||
| npy_uint64 simd_maski; | ||
| hn::StoreMaskBits(f32, simd_mask, (uint8_t*)&simd_maski); | ||
| hn::StoreMaskBits(f32, simd_mask, (uint8_t *)&simd_maski); |
There was a problem hiding this comment.
A better approach is needed for libc fallback to properly handle scalable extensions. One potential improvement could be introducing a new intrinsic, such as bool TestBit(MASK, int pos), which would also be endian-friendly. For now, I applied a quick fix below to address a big-endian bug, as passing a uint64_t as a uint8_t array obviously lead to accessing garbage bytes instead.
|
@r-devulap, Oh, my editor automatically reformatted the entire source code based on the NumPy clang-format configuration. Would you prefer that I revert these changes, or is this formatting acceptable for you? |
|
All CI tests for s390x have passed. |
a18b34b to
b7ae869
Compare
|
@r-devulap if you are happy about the changes please just put it in (I don't think the reformatting matters too much -- it's not that much). |
b7ae869 to
bda2921
Compare
Yeah no worries. This looks better anyways. |
yeah makes sense. I had to fix some rebase problems. But once the CI passes it should be good to go in. |
|
Let's put this in then, thanks @r-devulap and @seiko2plus. |
|
Not sure if this is having any impact here. google/highway#2409 |
|
I can confirm that trying to install numpy>= 2.0.0 for python3.10 breaks on UBI-8 image. |
As discussed in the optimization meeting, @seiko2plus wanted to split VSX and VXE into two separate PR's.
For clarification, SIMD optimizations for sine and cosine functions on both ppc64 and z/Architecture (IBM Z) were disabled by #25781 to bypass CI tests. This PR aims to re-enable optimizations for z/Architecture after addressing the following runtime errors, while #27627 re-enabled ppc64 optimizations.