Skip to content

Comments

MAINT: refactor: arraydescr_new now does not call PyArray_DescrNew #12430

Closed
mattip wants to merge 3 commits intonumpy:masterfrom
mattip:dtype-refactor
Closed

MAINT: refactor: arraydescr_new now does not call PyArray_DescrNew #12430
mattip wants to merge 3 commits intonumpy:masterfrom
mattip:dtype-refactor

Conversation

@mattip
Copy link
Member

@mattip mattip commented Nov 20, 2018

I started making dtype classes so that the type(np.dtype('int32')) is no longer numpy.dtype rather a new numpy.int_type type that inherits from numpy.dtype, the first step in the plan to refactor dtypes. Rather than a single large PR, I will try to do this in a set of smaller PRs, with an "opt-out" #ifdef ... #else ... #endif around the code.

This PR is the first in the series - a refactoring of the dependency between arraydescr_new and PyArray_DescrNew so that the latter now calls the former instead of the other way around. I also refactored the function to create datetime64 metadata and moved it to a different place in the file.

Benchmarking seemed to show no real difference in performance, although there is alot of variance in the bench_core.CountNonzero.time_count_nonzero tests

@eric-wieser
Copy link
Member

eric-wieser commented Nov 20, 2018

Datetime cleanup looks good too, but completely unrelated to your PR title - can you split that off into a separate PR?

@mattip mattip changed the title MAINT: refactor dtype so that arraydescr_new does not call PyArray_De… MAINT: refactor: arraydescr_new now does not call PyArray_DescrNew Nov 22, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these bug fixes, or could this previously never return NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As PyArray_DESCR_REPLACE calls PyArray_DescrNew which has various failure modes, it is a bugfix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored these into PR #12546

/* dtype */ base,
/* align */ Py_False,
/* copy */ Py_True);
PyObject *ret = PyObject_CallObject((PyObject*)Py_TYPE(base), tuple);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject_CallFunctionObjArgs((PyObject *)Py_TYPE(base), base, Py_False, Py_True) would be shorter here - or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adopted

newdescr->subarray = PyArray_malloc(sizeof(PyArray_ArrayDescr));
if (newdescr->subarray == NULL) {
Py_DECREF(newdescr);
return (PyArray_Descr *)PyErr_NoMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PyErr_NoMemory() needs to remain

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert removal


/* Initialize the PyArray_Descr fields */
static int
_initialize_descr(PyArray_Descr *newdescr, PyArray_Descr *base)
Copy link
Member

@eric-wieser eric-wieser Nov 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this function to be:

static PyObject *descr_copy(PyArray_Descr *base) {
    newdescr = PyObject_New(PyArray_Descr, Py_TYPE(base));
    // as below
    return newdescr
}

or if for some reason, we want to be able to change the type of the copy.

// classmethod
static PyObject *descr_from_descr(PyTypeObject *cls, PyArray_Descr *base) {
    newdescr = PyObject_New(PyArray_Descr, cls);
    // as below
    return newdescr
}

Is there a erason that's a bad idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopting the second option, so we can, in the future, subclass dtype

@eric-wieser
Copy link
Member

eric-wieser commented Nov 25, 2018

I started making dtype classes so that the type(np.dtype('int32')) is no longer numpy.dtype rather a new numpy.int_type type that inherits from numpy.dtype,

I'm not sure this is where we want to end up. The model that I think we should be aiming for is:

# dtype is a metaclass
>>> issubclass(np.dtype, type)
True

# dtypes instances are still just that - dtype instances
>>> i = np.dtype('int32')
>>> type(i)
np.dtype

# but they're also now real python types
>>> isinstance(i, type)
True

# meaning they can be used as constructors
>>> type(i(1.0)) is i
True

# and eventually, probably become one with the scalar types
>>> i is np.int32
True

I think that matches up with your dtype task list too.

np.int_type shouldn't be a subclass of dtype in my mind - it should be an instance of dtype.

With that in mind, I'm not sure this patch really makes any sense - I can't think of a good reason to subclass dtype right now.

Sorry for not making this remark earlier in the review.


Would be great if you could split off the bug fixes here so that those can make 1.16

@mattip
Copy link
Member Author

mattip commented Nov 25, 2018

With that in mind, I'm not sure this patch really makes any sense - I can't think of a good reason to subclass dtype right now.

This is one step toward enabling proper subclassing of dtypes, needed for projects like units and categorical dtypes. The first step is allowing creation of other dtype types (not objects), while keeping the array.dtype attribute an object. The final goal is indeed to make array.dtype a type object. Even better would be that making an instance of the dtype would be a scalar.

While this PR is only a minor refactoring, it fixes the order of arraydecr_new and the API function PyArray_DescrNew. The internal tp_new should not be calling out to an API function, so the PR is valuable on its own.

@eric-wieser
Copy link
Member

eric-wieser commented Nov 25, 2018

This is one step toward enabling proper subclassing of dtypes

As I read the task document, the intent was to subclass dtype instance s, not dtype itself.

I cannot think of a single use case right now for subclasses of dtype itself.

The internal tp_new should not be calling out to an API function, so the PR is valuable on its own.

It's not clear that this is obviously true. The python analogy to what we currently have would be a __new__ implemented as:

def __new__(cls, arg):
    if isinstance(arg, dtype):
        return dtype.from_dtype(arg)
    elif isinstance(arg, str):
        return dtype.from_descr(arg)

@staticmethod
def from_dtype(arg):
    # This is PyArray_DescrNew, written as python
    self = object.__new__(dtype)
    self.elsize = arg.elsize  # etc
    return self

Where dtype.__new__ is just a dispatcher that picks between a bunch of API functions based on the type of the argument.

The analogy to your patch is a patch that:

  • Removes all callers of dtype.from_dtype, and make them go through type dispatching even though they have no need to.
  • Changes the staticmethod to a classmethod - it's not clear to me that this buys anything, but I suppose it's not harmful.

@mattip
Copy link
Member Author

mattip commented Nov 25, 2018

Is subclassing an instance a common python idiom? I thought subclasses must inherit from a type, not an instance. Are there restrictions on methods called on a class that inherits from an instance?

@eric-wieser
Copy link
Member

eric-wieser commented Nov 25, 2018

Is subclassing an instance a common python idiom? I thought subclasses must inherit from a type

No, you're correct. My point is that subclasses of dtype doesn't seem like a stepping stone to where we want to end up. I think the first step should be to make dtype a subclass of type, not the last step.

@mattip
Copy link
Member Author

mattip commented Nov 25, 2018

well, it making dtype a subclass of type is the next step, and will not be the last step. I need the change to use subtype in PyObject_New. I guess I could redo this PR to do that with a more minimal change, but it seems unnatural to me that arraydescr_new, which is type->tp_new is calling out to an API function to actually allocate, not the other way around.

Edit: clarify "it"

@eric-wieser
Copy link
Member

I need the change to use subtype in PyObject_New.

I understand this - it's that next step specifically that I think is in the wrong direction, and something we'll end up undoing. Once dtype becomes a metatype, then we'll end up reverting back to using PyObject_New called on dtype, not subclasses.

Could you maybe update your document with the proposed steps here? Perhaps I'm not seeing the bigger picture.

but it seems unnatural to me that arraydescr_new, which is type->tp_new is calling out to an API function to actually allocate, not the other way around.

It's worth looking at the immutable types in CPython for precedence here. PyFrozenSet_New does not call frozenset->tp_new at all. Nor is PyLongObject->tp_new called by any of the C api constructors.

I think it's better to view __new__ as the python-only overloaded entry point to the construction, and not something the C api should go through. Extracting a common helper function for both to go through is fine (the descr_from_descr function of this patch), but since CPython doesn't do it either, I don't think it makes sense for the C api to round trip through the python api.

@mattip
Copy link
Member Author

mattip commented Nov 28, 2018

Could you maybe update your document with the proposed steps here? Perhaps I'm not seeing the bigger picture.

I am kind of working it out as I go. I think I know what I want, but I am not sure how to get there. Any help or critique is helpful, thanks for the insights.

@mattip
Copy link
Member Author

mattip commented Dec 23, 2018

Closing this in favor of other approach

@mattip mattip closed this Dec 23, 2018
@mattip mattip deleted the dtype-refactor branch June 8, 2020 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants