ENH: Vectorize np.sort and np.partition with AVX2#25045
ENH: Vectorize np.sort and np.partition with AVX2#25045seiko2plus merged 22 commits intonumpy:mainfrom
Conversation
| (_add_keepdims(np.min), _add_keepdims(np.argmin), dict()), | ||
| (_add_keepdims(np.max), _add_keepdims(np.argmax), dict()), | ||
| (np.partition, np.argpartition, dict(kth=2)), | ||
| #(np.partition, np.argpartition, dict(kth=2)), |
There was a problem hiding this comment.
I wasn't sure what to do with this test. When there is no unique defined output to np.partition or np.argpartition, we obviously cannot expect their outputs to match with each other, right?
There was a problem hiding this comment.
we obviously cannot expect their outputs to match with each other, right?
I agree since the order is supposed to be undefined.
|
Commit f92e3b3 updates test |
|
Rebased after #24018. Also split highway dispatch to a separate file. |
Mousius
left a comment
There was a problem hiding this comment.
Just scrolled over this, think the split looks good but maybe we can avoid the last pointer check.
|
Cygwin failure seems unrelated. |
|
Rebased with main to fix the cygwin CI failure. |
This reverts commit 76d5534.
Perf improvements to AVX2 sorting: see numpy/x86-simd-sort#104
Mousius
left a comment
There was a problem hiding this comment.
This looks good to me, but it'd be good to restart CI with this month's credits. @ngoldbaum, do you have buttons for that?
seiko2plus
left a comment
There was a problem hiding this comment.
LGTM, thank you Raghuveer!. Could you please make a few changes as suggested?
|
Thank you @r-devulap. |
|
Does this need a release note? |
You could add an improvement release note, won't hurt. |
Adds AVX2 implementations of np.sort and np.partition for 32-bit and 64-bit dtypes. Speeds up 32-bit sort by up to 13x and 64-bit sort by up to 7x.
This patch includes commits from #24924 (makes it easier to rebase later).
EDIT: updated benchmark numbers after we made a fewmore improvments to 64-bit AVX2 sorting in numpy/x86-simd-sort#99.
Detailed benchmarks are here. https://gist.github.com/r-devulap/4d55b5c1909a7ed0743746cf719bf9d2