BUG: Add timsort without breaking the API.#12945
Conversation
In order to maintain forward compatibility it is necessary to keep the size of PyArray_ArrFuncs struct fixed. The usual trick of adding new elements to the end of the structure is not available in this case because the struct may be instanciated by user types and we have no way to know whether the new or old struct is in play. The solution adopted here is the reuse the (a)mergesort slots for stable sorts of all kinds, with the actual kind set when the struct is initialized. The '(a)mergesort' option thus becomes an alias for 'stable', but we keep it for backwards compatibility.
|
There will be another commit with documentation updates, but I thought it would get this up for discussion of the approach taken to maintain forward compatibiity. |
|
Github testing seems to be down. EDIT: Looks like a clock problem. |
|
Repeating comment from #12418 Looking at this again, I am not sure exactly where the change manifests itself. We use Can we perhaps increment Note that pickling or np.save/np.load across numpy versions will not be problematic since the dtypes are rebuilt (or the internal singletons reused). |
|
Note the same problem exists in PR #12585 where we change the |
It is possible user types could have either the old or new versions, so that doesn't really tell you what version is in use. Best would be if the structure was versioned, but it isn't. We do need to keep this problem in mind when designing the new types, we might want to make sure everything is versioned or has spare slots if we stick to anything close to the current implementation. Either that, or have the user supply a table of getter functions rather than a table of functions. Any way, needs thought. I did start a PR adding pointers to function tables to the end of the I think this PR does supply a usable workaround that doesn't break the API, but it is here for discussion :) |
|
Another possibility is to have Numpy supply the instances to the user through a function call and the user could then fill them in, that would be like the current setup and new elements could be added to the end of the structures as we currently do. The trick here is to ensure that NumPy is the sole supplier of instances. EDIT: That could be a safe way forward, perhaps starting with a deprecation of the current functions after an alternative is available. Or we could just figure there aren't enough folks implementing user types to worry about, and just change things. In any case, I think we should set things up so we can modify the structures when we need to. |
Might be interesting to see which of the packages in Anaconda have user types. |
Update the sorting documentation to reflect the reality that 'mergesort' and 'stable' are aliases and may refer to any of several stable sorting algorithms depending on data type. [ci skip]
|
I tried a version of quaternion compiled against numpy.1.13 and a version of numpy with the changed
There may be more hiding in private reops somewhere.
This seems like good practice. Could we
We already hide the cast functions behind |
|
Let's move this discussion somewhere else, maybe to the dtype NEP and on the list. I'd really appreciate it if some of the CS types would participate. Anyway, I think the upshot is that we need this PR, which also allows for the addition of radixsort, and if we want all the different sorts user selectable, that can be done later when the machinery is there. The current sorts can be accessed by users, they are in a library. |
|
Thanks Chuck. |
In order to maintain forward compatibility it is necessary to keep the
size of PyArray_ArrFuncs struct fixed. The usual trick of adding new
elements to the end of the structure is not available in this case
because the struct may be instanciated by user types and we have no way
to know whether the new or old struct is in play.
The solution adopted here is the reuse the (a)mergesort slots for stable
sorts of all kinds, with the actual kind set when the struct is
initialized. The '(a)mergesort' option thus becomes an alias for
'stable', but we keep it for backwards compatibility.