Conversation
That's strange, historically it has been about 15-30% faster, improving with recent hardware. I wonder if something has changed. Seems about the same as before. What hardware are you running on? |
|
@charris Thanks for the response. I can reproduce both your result and mine, on the development branch or stable 1.15. We set up the array differently Shuffle After some digging in the assembly code I think the reason for this is some heavy compiler optimization for mergesort of int64. For integer, the following core mergesort operation is translated to: Thanks to the low cost of integer comparison ( I didn't expect numeric data type has such a huge impact on sorting efficiency... Benchmark becomes more and more difficult 😭. BTW, If you are interested, I'm running on a Linux virtual machine on Windows. CPU info: EDIT: I wrote a blog on these interesting phenomena: Mergesort is faster than Quicksort including my tests on multiple CPUs and compilers. |
|
@liwt31 I'll get back to this once the 1.16 rc is out. |
|
It's all right 😄. Take your time. |
|
I'm back. Your blog post was very interesting. |
|
The benchmarks are impressive, timsort certainly looks like a good replacement for mergesort, and in many cases for quicksort as well. I guess there might be some discussion as to whether it should silently replace mergsort in the same way as introsort has replaced quicksort, but however that goes, it looks like it is worth pursuing. The sorting benchmarks need updating per your issue on the matter, so that might be the next step here. I find it surprising that a algorithm so complex does so well. |
|
There is another PR for radix sorting, #12586. It might be helpful to coordinate, at least for the benchmarks and tests. |
|
@charris Thanks for your reply! So it seems I can move on to the sibling algorithms then. This can probably be done in 1 or 2 weeks.
I see that he has updated the NumPy sorting benchmark and has noticed my little benchmark work here. If he finds any parts of my work interesting he can probably include these in his PR. I think the PR can also close #12368 which proposes to add benchmark for mergesort.
Well, the idea behind Timsort isn't very complex (just as other great ideas), and I've removed some "over-complex" part of it (primarily gallop mode during merging) for NumPy use case. |
Yes, please do. |
|
PS, you might look to publish your blog post somewhere noticeable. |
Glad you like the post! I'd also like to have this piece of "new" knowledge been heard, but I can't think of any proper place. Do you have any suggestions? |
|
Finished numeric argsort, string sorts and generic sort. New benchmarks here. I think the next to decide is whether replace Mergesort with Timsort or not. Once it's done I can move on to add docs for the new algorithm. EDIT: Sorry for the force-push, a typo in the commit message. |
|
@mattip I'm not sure if Timsort should replace Mergesort. The decision is important for docs. OTOH, I can't add docstring without rebasing because when I forked there's no 1.17 release note file. I suppose it's OK for me to rebase?
#12368 suggests at least 2 ways to address the issue and as far as I know we haven't decided yet, but probably this PR can be safely merged without updates in the benchmark suite because it's already benchmarked (although elsewhere) and the issue can be solved in another PR. |
|
Rebasing is fine. This is not a backport candidate to 1.16 as it is a new feature. I don't think we should replace |
|
Rebase & add docs assuming not replacing mergesort. If I missed something or mergesort should be replaced, please let me know. |
|
@hameerabbasi any thoughts? |
|
I've been through this a few times rather quickly. I agree with @charris that we should make timsort replace merge sort... Also, this will have the side benefit of speeding up quicksort, as that falls back to merge sort as well. Another (kind of selfish) factor is that my PR will need fewer rebase steps. |
|
In our last weekly developer conversation, we decided that timsort should become the stable sort. I added a few comments where things need updating |
|
Why are the azure builds (32 bit and macos) failing? Rerunning them again failed, so it is not a once-off. |
|
To me, it seems |
|
Azure is doing that for everyone at the moment, it is an azure problem. |
|
Thanks @liwt31 |
|
Thank you all for your comments and advice. |
|
Hmm, I just noticed that this is going to break forwards compatibility of extensions due to PyArray_ArrFuncs containing function pointer arrays whose size depends on |
|
Lets call the third function EDIT: Actually, we just need to change the initialization of the pointers depending on type. The various sorts should still be independent for functions that do not call through the function pointers. |
|
Since this is most likely going into 1.17, and that breaks 2.7 compat either way, wouldn't it be okay to push it in? |
|
Anyway, I'd be okay with replacing mergesort with radix/timsort based on the dtype as well. |
No, since it will move the offset of all the following functions, which will break too many things. We should have a test for the offset of the last element in each of the exported structs to make sure this is spotted earlier ... |
|
Not sure I understand how and why a change in the size of PyArray_ArrFuncs affects forwards compatibility. Can I simply take that mergesort should be replaced with timsort to keep the total number of sorting algorithms to 3? |
|
@liwt31 The offset of the elements in the structure changes, so code compiled for the old structure will access the wrong elements in the new structure. The idea of forward compatibility is that C extension modules compiled against an older NumPy will run with newer NumPy versions. What we don't guarantee is the extension modules compiled against new NumPy run on old NumPy. If new variables are added to the end of a structure, there is no problem because old code will never access them and the offsets of the old elements doesn't change. Unfortunately, the arrays containing the sort function pointers are in the middle of the |
|
Note that the |
Changed my mind. We should leave the enum types as it was, maybe with some added aliases, and just change the structure initializations. |
|
@liwt31 Do you want to do this? I was going to do it myself but would be happy if you wanted to pursue it. |
|
Another option is to add additional function tables to the end of the structure. |
|
Thanks for the explanation, I get the idea now. I'd love to make a PR for this but I'm on vacation recently, so please go ahead.
|
|
Looking at this again, I am not sure exactly where the change manifests itself. We use
Note that pickling or np.save/np.load across numpy versions will not be problematic since the dtypes are rebuilt (or the internal singletons reused). |
|
xref PR #12945 |
This is part of #12186 .
This PR is still under development in that only numeric timsort is implemented. For argsort and sorting algorithms of string types and generic type, a dummy mergesort is pasted as place-holder. I wish to hear more feedback before moving on, for it appears implementing the sibling sorting algorithms requires lots of copy and paste.
Overview
Currently what I've done:
't'(timsort) forkindargument insortfunction. This is primarily for benchmark convenience.The algorithm
The implementation is primarily based on Tim's little article, with some simplifications:
The reason for these simplifications is their large overhead. Tim's algorithm is specially designed for CPython, which have a very time-consuming comparing function. The direction of Tim's optimization is to reduce the number of comparisons required whenever possible. However, this is not necessary for most NumPy use case. Maybe for generic sort, these further optimizations can be included, however, IMHO this should be done in another PR.
I haven't prepared organized materials to show their "large overhead" I found in my experiments. If anyone is interested I'd be happy to elaborate on this.
Benchmark
The benchmark includes a benchmark script, a jupyter-notebook to show the shape of the arrays involved in the benchmark for an intuitive picture, and a readme file for benchmark results and comments. Simply speaking, timsort performs well in "nearly sorted" data and shows minimal overhead in random data. Besides, I noticed the following 2 interesting phenomena:
I haven't got time to dig into these and I'm more than happy to hear from you if they are actually perfectly normal.
Though timsort is designed to be stable and I've tried to make it so, I haven't done any benchmark or test for this, because I only have numeric sort in my hand.
Feedbacks I wish to hear
Todo