ENH: Vectorize np.partition and np.argpartition using AVX-512#24201
ENH: Vectorize np.partition and np.argpartition using AVX-512#24201mattip merged 23 commits intonumpy:mainfrom
Conversation
|
Looks like all the CI failures are related to 2 tests:
where both the tests expect the output of |
|
I guess that problem solved itself when we vectorize both partition and argpartition :) |
|
Benchmarks for |
| inline bool quickselect_dispatch(T* v, npy_intp num, npy_intp kth) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Windows will never get the faster version?
There was a problem hiding this comment.
The quicksort patch ran into some problem with WIN32 builds and I never quite figured it out. Let me try running the partition patch through the CI, may be this one has better luck on windows.
There was a problem hiding this comment.
Looks like this one had no trouble. Windows will get the faster version too :) Might be worth checking if quicksort can be enabled, but will do it in a separate PR.
There was a problem hiding this comment.
It seems to work. Should we circle back to enabling AVX512 quicksort for windows?
There was a problem hiding this comment.
Yes, sorry, separate PR would be good.
There was a problem hiding this comment.
AVX512 quicksort, argsort, partition and argpartition are all enabled on windows (64-bit only though).
|
LGTM. I did not review the subreo. @seiko2plus any thoughts? |
|
Rebasing with main. |
|
Something went wrong with the Travis CI, I think unrelated to this patch. |
Yes, unrelated. That failure happens now and then, the give away is "and 1 action required checks". |
|
bah, that Travis CI failed again. |
0029799 to
1cd3d8b
Compare
|
friendly ping :) |
89f58fc to
eefc2b6
Compare
29cd42b to
97e6714
Compare
|
Phew! Ready for review now. |
seiko2plus
left a comment
There was a problem hiding this comment.
Great performance improvement, Thank you! Just need to fix the float16 dispatching. Although our current tests should catch it, unfortunately, none of our CI workers support AVX512/ICL at least, also intel SDE still broken.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
kth = np.float16(0.1558), k = np.int64(29)
arr_part = array([ 0.001953, 0.007324, 0.01172 , 0.012695, 0.0249 , 0.0381 ,
0.03857 , 0.07666 , 0.07764 , 0.07... , -0.2622 , -0.2915 , -0.3003 ,
-0.3047 , -0.3323 , -0.3682 , -0.4297 , -0.4526 ],
dtype=float16)
def assert_arr_partitioned(kth, k, arr_part):
> assert_equal(arr_part[k], kth)
E AssertionError:
E Items are not equal:
E ACTUAL: np.float16(-0.2046)
E DESIRED: np.float16(0.1558)
arr_part = array([ 0.001953, 0.007324, 0.01172 , 0.012695, 0.0249 , 0.0381 ,
0.03857 , 0.07666 , 0.07764 , 0.07... , -0.2622 , -0.2915 , -0.3003 ,
-0.3047 , -0.3323 , -0.3682 , -0.4297 , -0.4526 ],
dtype=float16)
k = np.int64(29)
kth = np.float16(0.1558)
numpy/core/tests/test_multiarray.py:51: AssertionError
================ short test summary info ================
FAILED numpy/core/tests/test_multiarray.py::test_partition_fp[float16-N0]
FAILED numpy/core/tests/test_multiarray.py::test_partition_fp[float16-N1]
....1087f45 to
6cd06c0
Compare
|
@seiko2plus Fixed the bug and passes all the tests on a Tigerlake. We should have an SDE version with the bug fix out soon and we should run CI on TGL and SPR. |
|
Thanks @r-devulap |
Provides a speed up of 25x for 16-bit, 17x for 32-bit dtypes and about 8x speed up for 64-bit dtypes. Benchmark numbers on an array of size 100000 for varying values of k (10, 100 and 1000).
Benchmark for random data on Intel TGL:
Benchmark numbers on other kinds of array can be seen here.