MAINT: refactor: arraydescr_new now does not call PyArray_DescrNew #12430
MAINT: refactor: arraydescr_new now does not call PyArray_DescrNew #12430mattip wants to merge 3 commits intonumpy:masterfrom
Conversation
|
Datetime cleanup looks good too, but completely unrelated to your PR title - can you split that off into a separate PR? |
3fc5f48 to
585d4fe
Compare
There was a problem hiding this comment.
Are these bug fixes, or could this previously never return NULL?
There was a problem hiding this comment.
As PyArray_DESCR_REPLACE calls PyArray_DescrNew which has various failure modes, it is a bugfix
585d4fe to
1993561
Compare
| /* dtype */ base, | ||
| /* align */ Py_False, | ||
| /* copy */ Py_True); | ||
| PyObject *ret = PyObject_CallObject((PyObject*)Py_TYPE(base), tuple); |
There was a problem hiding this comment.
PyObject_CallFunctionObjArgs((PyObject *)Py_TYPE(base), base, Py_False, Py_True) would be shorter here - or am I missing something?
| newdescr->subarray = PyArray_malloc(sizeof(PyArray_ArrayDescr)); | ||
| if (newdescr->subarray == NULL) { | ||
| Py_DECREF(newdescr); | ||
| return (PyArray_Descr *)PyErr_NoMemory(); |
There was a problem hiding this comment.
This PyErr_NoMemory() needs to remain
|
|
||
| /* Initialize the PyArray_Descr fields */ | ||
| static int | ||
| _initialize_descr(PyArray_Descr *newdescr, PyArray_Descr *base) |
There was a problem hiding this comment.
I'd prefer this function to be:
static PyObject *descr_copy(PyArray_Descr *base) {
newdescr = PyObject_New(PyArray_Descr, Py_TYPE(base));
// as below
return newdescr
}
or if for some reason, we want to be able to change the type of the copy.
// classmethod
static PyObject *descr_from_descr(PyTypeObject *cls, PyArray_Descr *base) {
newdescr = PyObject_New(PyArray_Descr, cls);
// as below
return newdescr
}Is there a erason that's a bad idea?
There was a problem hiding this comment.
Adopting the second option, so we can, in the future, subclass dtype
I'm not sure this is where we want to end up. The model that I think we should be aiming for is: # dtype is a metaclass
>>> issubclass(np.dtype, type)
True
# dtypes instances are still just that - dtype instances
>>> i = np.dtype('int32')
>>> type(i)
np.dtype
# but they're also now real python types
>>> isinstance(i, type)
True
# meaning they can be used as constructors
>>> type(i(1.0)) is i
True
# and eventually, probably become one with the scalar types
>>> i is np.int32
TrueI think that matches up with your dtype task list too.
With that in mind, I'm not sure this patch really makes any sense - I can't think of a good reason to subclass dtype right now. Sorry for not making this remark earlier in the review. Would be great if you could split off the bug fixes here so that those can make 1.16 |
This is one step toward enabling proper subclassing of dtypes, needed for projects like units and categorical dtypes. The first step is allowing creation of other dtype types (not objects), while keeping the array.dtype attribute an object. The final goal is indeed to make While this PR is only a minor refactoring, it fixes the order of |
As I read the task document, the intent was to subclass I cannot think of a single use case right now for subclasses of
It's not clear that this is obviously true. The python analogy to what we currently have would be a def __new__(cls, arg):
if isinstance(arg, dtype):
return dtype.from_dtype(arg)
elif isinstance(arg, str):
return dtype.from_descr(arg)
@staticmethod
def from_dtype(arg):
# This is PyArray_DescrNew, written as python
self = object.__new__(dtype)
self.elsize = arg.elsize # etc
return selfWhere The analogy to your patch is a patch that:
|
|
Is subclassing an instance a common python idiom? I thought subclasses must inherit from a type, not an instance. Are there restrictions on methods called on a class that inherits from an instance? |
No, you're correct. My point is that subclasses of |
|
well, Edit: clarify "it" |
I understand this - it's that next step specifically that I think is in the wrong direction, and something we'll end up undoing. Once dtype becomes a metatype, then we'll end up reverting back to using Could you maybe update your document with the proposed steps here? Perhaps I'm not seeing the bigger picture.
It's worth looking at the immutable types in CPython for precedence here. I think it's better to view |
I am kind of working it out as I go. I think I know what I want, but I am not sure how to get there. Any help or critique is helpful, thanks for the insights. |
|
Closing this in favor of other approach |
I started making dtype classes so that the
type(np.dtype('int32'))is no longernumpy.dtyperather a newnumpy.int_typetype that inherits fromnumpy.dtype, the first step in the plan to refactor dtypes. Rather than a single large PR, I will try to do this in a set of smaller PRs, with an "opt-out"#ifdef...#else...#endifaround the code.This PR is the first in the series - a refactoring of the dependency between
arraydescr_newandPyArray_DescrNewso that the latter now calls the former instead of the other way around.I also refactored the function to createdatetime64metadata and moved it to a different place in the file.Benchmarking seemed to show no real difference in performance, although there is alot of variance in the
bench_core.CountNonzero.time_count_nonzerotests