Skip to content

BUG: allow registration of hard-coded structured dtypes#17320

Merged
mattip merged 4 commits intonumpy:masterfrom
seberg:relax-object-dtype-with-ref
Oct 7, 2020
Merged

BUG: allow registration of hard-coded structured dtypes#17320
mattip merged 4 commits intonumpy:masterfrom
seberg:relax-object-dtype-with-ref

Conversation

@seberg
Copy link
Member

@seberg seberg commented Sep 15, 2020

It is probably fine if user dtypes have a hardcoded structure (although I am not sure we can allow them to support byte-swapping). In that case, it is even possible for them to include an object field.

One reason for this error was to find out if someone uses this type of thing, and xpress does create "typed" object dtypes in this manner, so relax it to allow xpress to keep working.

xpress runs into the error message I tried to make a bit clearer (not filling in PyArrayDescr_Type before registration). That is unfortunate, but I am not sure I want to check for it at every single np.dtype() call (or similar), so that I think we may just need to ask them to move the code and release a new version.


I tested this using xpress, not sure how easy it is to add another user dtype to test these error paths.

This also includes `NPY_ITEM_IS_POINTER`.  A previous changed
raised an error when this happened, but some downstream libraries
use it to create a custom dtype with a single object field.

It seems acceptable to create such a dtype if (and only if) that
dtype hardcodes names and fields at creation time, so this change
allows that, but otherwise keeps the error intact.

This should work fine, although some care may be required
This allows running `xpress` on current master. `xpress` copies the
type from `np.dtype(object)` rather than using `&PyArrayDescr_Type`.
That seems overall fine, we can just replace it.
The only reason for this check is to ensure that the user does not
override our updated `type(user_dtype)` and thus corrupting it.
@seberg
Copy link
Member Author

seberg commented Sep 15, 2020

@octachrome this PR should make xpress work in current master. Its a bit of a hack (but that is life), basically the way xpress registers its dtypes seems fine, and NumPy can just make it work, I think.

I am slightly worried about missing reference counting corner cases with a dtype such as xpress uses it, but I don't see any serious issues.

@charris
Copy link
Member

charris commented Sep 15, 2020

Does this need a backport?

@seberg
Copy link
Member Author

seberg commented Sep 15, 2020

No need to backport, this is just fighting with more exotic user-dtypes in master. Moving where descr->typenum is set might be nice, but really not necessary.

@octachrome
Copy link
Contributor

Thanks again for taking the time to look at our module. I will pass this on to the lead dev.

@octachrome
Copy link
Contributor

not filling in PyArrayDescr_Type before registration

Do we need to address this in xpress? How can I get hold of a master build of NumPy to test with?

@mattip
Copy link
Member

mattip commented Sep 17, 2020

How can I get hold of a master build of NumPy to test with?

We upload weekly builds to https://anaconda.org/scipy-wheels-nightly/numpy, so you can install them with

pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

@seberg
Copy link
Member Author

seberg commented Sep 17, 2020

Right now there is hopefully nothing you need to do in xpress. This PR needs to go in, before you can test anything.

Testing against the pip install matti gave would be very appreciated though. The quirkiest thing I have found until now is that some casts may change (i.e. allow additional casts to and from object dtype). The change there, is that I was current planning on effectively always using getitem/setitem to implement that cast. It doesn't seem like typical users would run into it, but I guess you never quite now.

@octachrome
Copy link
Contributor

Importing xpress against the latest nightly NumPy, 1.20.0.dev0+9dd6c60, raises TypeError: invalid type number. No further information. I would need to set up a debug build to find out exactly where this is being raised.

@seberg
Copy link
Member Author

seberg commented Sep 17, 2020

@octachrome no need to try to debug it right now, it won't work unless you install from source with this PR in.

What you could do maybe is that errors happening during dtype registration seem to be ignored. The cuase of the error you see is actually that registering the new dtype failed earlier.

@octachrome
Copy link
Contributor

You're right, we aren't handling the case where PyArray_RegisterDataType returns -1. I'll fix this and check for other similar problems. Thanks.

@mattip
Copy link
Member

mattip commented Sep 17, 2020

Codecov seems to think the new code is not covered in a test. If that is correct, it would be nice to have a test that hits this path.

@seberg
Copy link
Member Author

seberg commented Sep 17, 2020

Yes, tests added. I didn't have the idea to just copy a full blueprint to make the test, but as is, its probably OK and not super long.

Interesting side issue, because xpress does not subclass from np.generic (I appreciate this fact!), it is in an unnecessary disadvantage because np.dtype(dtype.type) != dtype. We could change that, and I am tempted to just do it, but its orthogonal to this.

This includes a few error paths. Note that the test creates
a new dtype ever time it is run, and dtypes in NumPy are persistent
so that this leaks references.
@seberg seberg force-pushed the relax-object-dtype-with-ref branch from 6601a10 to e5f2ce3 Compare September 17, 2020 18:09
@merraksh
Copy link

@seberg

I am slightly worried about missing reference counting corner cases with a dtype such as xpress uses it, but I don't see any serious issues.

Is this about placing the right INC/DECREF in the dtype functions? At the moment we do have a correct balance of these for some of our tests. But obviously I'm interested in any insight you have about this.

@seberg seberg added this to the 1.20.0 release milestone Sep 25, 2020
@mattip mattip merged commit 4ebbaae into numpy:master Oct 7, 2020
@mattip
Copy link
Member

mattip commented Oct 7, 2020

Thanks @seberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments