WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy#12660
WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy#12660mattip wants to merge 2 commits intonumpy:masterfrom
Conversation
doc/neps/nep-0029-dtype-as-type.rst
Outdated
| is a class object. Instantiating that class object ``a.type(3)`` produces a | ||
| Num`py `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html>`_. | ||
|
|
||
| This NEP proposes a class heirarchy for dtypes. The ``np.dtype`` class will |
There was a problem hiding this comment.
hierarchy spelled wrong (here and below), also abstrace->abstract
| # subclass of IntDescr | ||
| return f"dtype('{_kind_to_stem[self.kind]}{self.itemsize:d}')" | ||
|
|
||
| def get_format_function(self, data, **options): |
There was a problem hiding this comment.
@mhvk noted: Still feel this would logically be on dtype.type.format; maybe that can be the default in the upper class?
Or, perhaps better, just omit it altogether and focus on a method on a dtype that currently exists, overriding that in Color.
There was a problem hiding this comment.
formatting exists on dtypes, it is based on a switch statement to get the proper formatting function in arrayprint. I am proposing to make that a method-based lookup on the dtype.
There was a problem hiding this comment.
You mean the format function including the part where it determines, e.g., for integer the maximum size the string will be? I guess that is indeed logical to have on the dtype. Still, for the purpose of the NEP, it seems too complicated; if the point is to be able to override a dtype, I feel it would be better to have an example that is closer to what a dtype normally does, like interpreting a bit of memory.
| itemsize = 8 | ||
| colors = ['red', 'green', 'blue'] | ||
| def get_format_function(self, data, **options): | ||
| class Format(): |
There was a problem hiding this comment.
@mhvk noted: Instead of formatting, I'd pick something that actually exists, perhaps the interpretation of the memory (either setting or getting).
There was a problem hiding this comment.
The purpose of a categorical dtype, as far as I can tell, is to give textual meaning to the values in the ndarray. The only place that has meaning is when printing them.
There was a problem hiding this comment.
Presumably, should be able to set that type with a string as well? Indeed, even accessing it should give, I would think, an ColorType scalar that is just an int with a different __str__ (heck, scalars come back to haunt us immediately!).
There was a problem hiding this comment.
Overriding any of the PyArray_Descr functions or ufuncs requires C code. I was looking for an easy way to demonstrate a subclass, but it seems I only confused reviewers.
|
Simplified by removing the metaclass and extending |
| # Dtype('int8') or Dtype('S10') or record descr | ||
| return create_new_descr(cls, *args, **kwargs) | ||
|
|
||
| class GenericDtype(Dtype): |
There was a problem hiding this comment.
There seems to be no benefit to having GenericDtype any more. I thought the argument before that you don't necessarily want to override what int means, but want to be sure that itemsize exists, was logical, so I would move some of the __new__ above here (or stick with the metaclass, it really wasn't bad and could be used to auto-fill methods based on itemsize, etc.)
There was a problem hiding this comment.
@mattip - a few more comments, though I must admit I'm confused. Mostly, I feel the examples are too abstract; is the goal actually to be able to subclass np.dtype with the NEP work in place? How would one actually put in the cast functions?
Maybe most relevant: I think it should be closer to an abstract base class, where the presence of itemsize only gets checked for the instance, not for the class.
| pass | ||
|
|
||
| class IntDtype(GenericDtype): | ||
| def __repr__(self): |
There was a problem hiding this comment.
This __repr__ could move to GenericDtype, as it would be the same for float, string (and that gives it some purpose!)
| from numpy.core.arrayprint import IntegerFormat | ||
| return IntegerFormat(data) | ||
|
|
||
| class UInt8Dtype(IntDtype): |
There was a problem hiding this comment.
Looking at this again, it seems strange that if arr.dtype is meant to be an instance, we are still creating a class here, with actual instantiation of the singleton doing nothing. It does seem one might then just as well just the class as the dtype and keep instantiation in reserve for something more interesting...
Or, more in the spirit of the PR (though with the metaclass, as the check for itemsize needs to come after initialization), could this be something like:
class UIntDtype(IntDtype):
kind = 'u'
def __init__(cls, itemsize):
self.itemsize = itemsize
self.minimum = 0
self.maximum = (1 << itemsize) - 1
uint8dtype = UIntDtype(itemsize=8)
There was a problem hiding this comment.
though with the metaclass, as the check for itemsize needs to come after initialization
Dtypes will be immutable, so I'd expect this to go through __new__
There was a problem hiding this comment.
Note with the issubclass(np.dtype, type) approach from the outdate NEP, this last line could look like:
class uint8dtype(metaclass=UIntDtype, itemsize=8):
pass| import numpy as np | ||
|
|
||
| class Dtype(): | ||
| def __new__(cls, *args, **kwargs): |
There was a problem hiding this comment.
Thinking more, the metaclass does in fact seem the best idea, but it should check for the presence of itemsize on any instances not on the class, i.e., it should let type do its work, and then check that itemsize exists. This, of course, is just what ABCmeta does...
|
In #12630 (comment), it was noted:
I think this goes to the crux of what I've been wondering about: how would this be done if I subclass Note that I think it is absolutely fine for the actual work to proceed in steps, i.e., keep using |
|
In the long run we certainly want to support defining new dtypes in either C or Python, so it makes sense for a spec or prototype to discuss both. But if the plan is to implement this in stages, then it makes sense for the first stage to only support subclassing in C, since that's where all the current dtypes are defined. Then stage 2 might be adding slot wrappers so that dtype methods are callable from python, and stage 3 would be adding the glue to actually define the methods in Python. |
|
If we are going with the approach that Dtype should be a metaclass, this PR should be closed in favor of #12630 |
|
Closing, it seems starting over will be easier than resuscitating this. Hopefully the discussion will have been useful as a starting point for a replacement. |
Replaces PR #12630 after comments there.