BUG,ENH: fix pickling user-scalars by allowing non-format buffer export #17295
BUG,ENH: fix pickling user-scalars by allowing non-format buffer export #17295mattip merged 9 commits intonumpy:masterfrom
Conversation
374c38b to
c901a8d
Compare
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
Hmm, this looks risky - are you sure it shouldn't be:
| if (a->format != NULL && b->format != NULL) { | |
| c = strcmp(a->format, b->format); | |
| if (c != 0) return c; | |
| } | |
| /* null format sorts before empty string */ | |
| c = (a->format != NULL) - (b->format != NULL); | |
| if (c != 0) return c; | |
| if (a->format != NULL && b->format != NULL) { | |
| c = strcmp(a->format, b->format); | |
| if (c != 0) return c; | |
| } |
Otherwise NULL is considered equal to arbitrary strings, and equality is not transitive
There was a problem hiding this comment.
Yes, should be correct. if the format is NULL it seems OK to replace it with any other format. Now if the old (first) format is NULL, we will replace the NULL with the actual (second) format. If the second format is NULL, format is ignored completely, so it is fine as well.
There was a problem hiding this comment.
An empty string would indicate that the itemsize is 0 in the exported buffer, I guess. A size change could seem problematic, but I do not think so.
In theory, changing the itemsize might be dangerous, but it is explicitly stored in the exported buffer info, so while it could be dangerous from a buffer user perspective, I do not think it is dangerous for having incorrect information inside the exported buffer information.
There was a problem hiding this comment.
Where does this replacing of NULL happen?
There was a problem hiding this comment.
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
it's looking increasingly like maybe we should just pass in the full flags argument
There was a problem hiding this comment.
Hmmm, fair point, could pass in the flags.
There was a problem hiding this comment.
Changed it and rebased. The rebase means the code changed slightly here (basically, I undid a bit of fixup that I did in the other PR to make this one easier, because I got it wrong...).
Previously we had code that would allow exporting the buffer, but then fail for any reasonable subclass, because such a subclass should have its own user-dtype. The change is, that now a subclass without its own user-dtype will inherit the correct behaviour directly. This allows pickling of of such user-defined scalars (with user-defined dtype) if no FORMAT was requested in the buffer export. The latter allows the generic pickling code to succeed. Closes numpygh-17294
This is necessary to allow pickling of the type object, which is necessary to test pickling of the scalar (and in arrays)
This also tests pickling as a regression test, since at least at this time it is directly related to the buffer export.
|
Test failure on 32bit linux looks real. Going to restart out of curiosity if it is reproducible (since I have currently no idea why it would fail). Failure |
| * after a PyObject_HEAD | ||
| */ | ||
| memloc = (npy_intp)scalar; | ||
| memloc = (uintptr_t)scalar; |
There was a problem hiding this comment.
Seems like this casting to signed was the problem (and I first thought it can't be because the denominator is stored as a denom - 1 here making the 2 a 1...
If the value is too large, it would be a negative which breaks the rounding (and also means that a previously aligned value looks unaligned).
|
Should we aim to backport this? It does fix an older regression. |
|
Does this need any documentation changes? I don't think pickling a user-scalar is very common, but will this impact any of the serialization protocols? |
|
Hmm, this fixes two things:
|
|
Ah, I guess its important to note that allowing buffers to always be exported only affects datetime and user-dtypes, so is a pretty niche thing, could add a release notes anyway... |
|
I think this is niche enough to not have a release note. Thanks @seberg |
This should close gh-17294 by allowing to export buffers for user defined types if the buffer export does not ask for the FORMAT to be filled it.
The diff is based on top of gh-16936 (which touches the same code), so marking it as draft.