BUG,ENH: fix pickling user-scalars by allowing non-format buffer export by seberg · Pull Request #17295 · numpy/numpy

seberg · 2020-09-11T17:17:47Z

This should close gh-17294 by allowing to export buffers for user defined types if the buffer export does not ask for the FORMAT to be filled it.

The diff is based on top of gh-16936 (which touches the same code), so marking it as draft.

eric-wieser · 2020-09-29T20:48:10Z

numpy/core/src/multiarray/buffer.c

Hmm, this looks risky - are you sure it shouldn't be:

Suggested change

if (a->format != NULL && b->format != NULL) {

c = strcmp(a->format, b->format);

if (c != 0) return c;

}

/* null format sorts before empty string */

c = (a->format != NULL) - (b->format != NULL);

if (c != 0) return c;

if (a->format != NULL && b->format != NULL) {

c = strcmp(a->format, b->format);

if (c != 0) return c;

}

Otherwise NULL is considered equal to arbitrary strings, and equality is not transitive

Yes, should be correct. if the format is NULL it seems OK to replace it with any other format. Now if the old (first) format is NULL, we will replace the NULL with the actual (second) format. If the second format is NULL, format is ignored completely, so it is fine as well.

An empty string would indicate that the itemsize is 0 in the exported buffer, I guess. A size change could seem problematic, but I do not think so.

In theory, changing the itemsize might be dangerous, but it is explicitly stored in the exported buffer info, so while it could be dangerous from a buffer user perspective, I do not think it is dangerous for having incorrect information inside the exported buffer information.

Where does this replacing of NULL happen?

https://github.com/numpy/numpy/pull/17295/files#diff-761c5d4c611d2cd411341182fcee883dR662

eric-wieser · 2020-09-29T20:48:49Z

numpy/core/src/multiarray/buffer.c

it's looking increasingly like maybe we should just pass in the full flags argument

Hmmm, fair point, could pass in the flags.

Is this still relevant?

Changed it and rebased. The rebase means the code changed slightly here (basically, I undid a bit of fixup that I did in the other PR to make this one easier, because I got it wrong...).

Previously we had code that would allow exporting the buffer, but then fail for any reasonable subclass, because such a subclass should have its own user-dtype. The change is, that now a subclass without its own user-dtype will inherit the correct behaviour directly. This allows pickling of of such user-defined scalars (with user-defined dtype) if no FORMAT was requested in the buffer export. The latter allows the generic pickling code to succeed. Closes numpygh-17294

This is necessary to allow pickling of the type object, which is necessary to test pickling of the scalar (and in arrays)

This also tests pickling as a regression test, since at least at this time it is directly related to the buffer export.

seberg · 2020-10-22T18:34:47Z

Test failure on 32bit linux looks real. Going to restart out of curiosity if it is reproducible (since I have currently no idea why it would fail).

Failure

=================================== FAILURES ===================================
_______ TestNewBufferProtocol.test_export_and_pickle_user_dtype[scalar] ________

self = <numpy.core.tests.test_multiarray.TestNewBufferProtocol object at 0xec9d1238>
obj = rational(1,2), error = <class 'TypeError'>

    @pytest.mark.parametrize(["obj", "error"], [
            pytest.param(np.array([1, 2], dtype=rational), ValueError, id="array"),
            pytest.param(rational(1, 2), TypeError, id="scalar")])
    def test_export_and_pickle_user_dtype(self, obj, error):
        # User dtypes should export successfully when FORMAT was not requested.
        with pytest.raises(error):
            _multiarray_tests.get_buffer_info(obj, ("STRIDED", "FORMAT"))
    
        _multiarray_tests.get_buffer_info(obj, ("STRIDED",))
    
        # This is currently also necessary to implement pickling:
        res = pickle.loads(pickle.dumps(obj))
>       assert_array_equal(res, obj)
E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: rational(255,514)
E       Max relative difference: rational(255,257)
E        x: array(rational(1,257), dtype=rational)
E        y: array(rational(1,2), dtype=rational)

error      = <class 'TypeError'>
obj        = rational(1,2)
res        = rational(1,257)
self       = <numpy.core.tests.test_multiarray.TestNewBufferProtocol object at 0xec9d1238>

../../venv/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py:7159: AssertionError

seberg · 2020-10-22T20:13:28Z

numpy/core/src/multiarray/scalarapi.c

     * after a PyObject_HEAD
     */
-    memloc = (npy_intp)scalar;
+    memloc = (uintptr_t)scalar;


Seems like this casting to signed was the problem (and I first thought it can't be because the denominator is stored as a denom - 1 here making the 2 a 1...

If the value is too large, it would be a negative which breaks the rounding (and also means that a previously aligned value looks unaligned).

seberg · 2020-10-27T18:50:29Z

Should we aim to backport this? It does fix an older regression.

mattip · 2020-11-03T13:07:36Z

Does this need any documentation changes? I don't think pickling a user-scalar is very common, but will this impact any of the serialization protocols?

seberg · 2020-11-03T14:59:51Z

Hmm, this fixes two things:

Exporting buffers without a requested format now always works
The fix about user-scalars is a regression. If it was not, I am not sure I would worry about it.

seberg · 2020-11-03T15:33:29Z

Ah, I guess its important to note that allowing buffers to always be exported only affects datetime and user-dtypes, so is a pretty niche thing, could add a release notes anyway...

mattip · 2020-11-03T16:07:07Z

I think this is niche enough to not have a release note. Thanks @seberg

seberg mentioned this pull request Sep 11, 2020

__reduce__ no longer works on user defined types #17294

Closed

seberg force-pushed the issue-17294 branch 2 times, most recently from 374c38b to c901a8d Compare September 12, 2020 02:31

charris added 00 - Bug 01 - Enhancement component: numpy._core labels Sep 27, 2020

eric-wieser reviewed Sep 29, 2020

View reviewed changes

seberg mentioned this pull request Oct 22, 2020

ENH,API: Store exported buffer info on the array #16938

Merged

seberg added 5 commits October 22, 2020 11:45

ENH: allow exporting user-dtype as buffers without FORMAT

d92f151

TST: Fix tp_name of rational

fae94b7

This is necessary to allow pickling of the type object, which is necessary to test pickling of the scalar (and in arrays)

TST: Add test for non-FORMAT user dtype array/scalar export

1725f31

This also tests pickling as a regression test, since at least at this time it is directly related to the buffer export.

MAINT: Pass in flags instead of format and contig explicitly

2fc4dbb

seberg force-pushed the issue-17294 branch from c901a8d to 2fc4dbb Compare October 22, 2020 16:57

DOC: Add a comment to explain the format transfer

e9ce0f4

seberg force-pushed the issue-17294 branch from e9fe30b to e9ce0f4 Compare October 22, 2020 17:01

seberg marked this pull request as ready for review October 22, 2020 17:02

TST: Modify test to see if the error is in pickling or unpickling

63f90a8

seberg force-pushed the issue-17294 branch from 4480759 to 0c49d8a Compare October 22, 2020 19:51

seberg added 2 commits October 22, 2020 15:08

BUG: Fix memloc to be unsigned and add debug info to test

09934b2

MAINT: Simplify code path slightly

d02ca96

seberg force-pushed the issue-17294 branch from 0c49d8a to d02ca96 Compare October 22, 2020 20:09

seberg commented Oct 22, 2020

View reviewed changes

mattip requested a review from eric-wieser November 2, 2020 20:41

mattip merged commit d62b0ee into numpy:master Nov 3, 2020

seberg deleted the issue-17294 branch November 3, 2020 16:31

seberg mentioned this pull request Nov 11, 2020

BUG: Fix buffer export dtype references #17753

Merged

-    if (a->format != NULL && b->format != NULL) {
-        c = strcmp(a->format, b->format);
-        if (c != 0) return c;
-    }
+    /* null format sorts before empty string */
+    c = (a->format != NULL) - (b->format != NULL);
+    if (c != 0) return c;
+    if (a->format != NULL && b->format != NULL) {
+        c = strcmp(a->format, b->format);
+        if (c != 0) return c;
+    }

Uh oh!

Conversation

seberg commented Sep 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Oct 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Oct 27, 2020

Uh oh!

mattip commented Nov 3, 2020

Uh oh!

seberg commented Nov 3, 2020

Uh oh!

seberg commented Nov 3, 2020

Uh oh!

mattip commented Nov 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants