Skip to content

BUG: Fix memory leak of buffer-info cache due to relaxed strides#16936

Merged
mattip merged 4 commits intonumpy:masterfrom
seberg:issue-16934
Oct 22, 2020
Merged

BUG: Fix memory leak of buffer-info cache due to relaxed strides#16936
mattip merged 4 commits intonumpy:masterfrom
seberg:issue-16934

Conversation

@seberg
Copy link
Member

@seberg seberg commented Jul 23, 2020

When relaxed strides is active (and has an effect), we recalculate
the strides to export "clean" strides in the buffer interface.
(Python and probably some other exporters expect this, i.e. NumPy
has fully switched to and embraced relaxed strides, but the buffer
interface at large probably not.)

The place where "fixing" the strides occured however meant that
when the strides are fixed, the old, cached buffer-info was not
reused when it should have been reused.

This moves the "fixing" logic so that reuse will occur. It leaves
one issue open in that an array shaped e.g. (1, 10) is both
C- and F-contiguous. Thus, if it is exported as C-contiguous and
then as F-contiguous, and then again as C-contiguous, this will
work, but the last export will compare to the export as an F-contig
buffer and thus still leak memory.

Address gh-16934 (but does leave a small hole)


@charris marking this as backport candidate. But, that code got some small changes, so it may be the diff does not apply (although I think it should).

Copy link
Member Author

@seberg seberg Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed that this slightly strange modification of the test does cause it to fail when using pytest-leaks.

EDIT: Not with this fix of course.

Comment on lines +517 to +533
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for a follow-up - this is overly strict, if the user did not request a contiguous buffer at all we still construct these replacement strides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there was a reason for that. Maybe Cython may sometimes requested a buffer without specifying the flag, but then checking for it anyway later-on (potentially even raising an error). I somewhat expects cython caught up over time with our interpretation of relaxed strides, but I am not 100% sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new buffer info is always constructed anyway, what's the benefit of having a cache at all? Also, it's a list but I only ever see the last item in the list referenced. Couldn't it be replaced with a single object per array? Then there is no memory growth at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be nice to skip part of that sometimes... and would be a small speed enhancement. But, we have to reuse the old one, so that you do not leak memory and we need to hold on to it to be able to free it again.

The problem is that we can only free these when the array is deleted, which is the reason for your memory leak. This is a fundamental issue around the buffer protocol, which actually does also provide the answer to the issue (by providing a free function). But due to some backcompat issues, we can't just use that...

Times change, and the need for that backcompat probably goes down. Maybe one more thing that would be nice to try out if we had a major release, I doubt that it would create a big issue in most cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the important part is that, even though nobody likes that, arrays are currently mutable both in shape/strides and dtype... So we can't just stick to a single exported buffer object :(.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to sum up, it isn't a cache in the usual sense but instead a store of all buffer objects.

I see this was 577dbbd which is 11 years old. If the fix 11 years old, perhaps that PyArgs_ParseTuple issue has been fixed in the mean time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that issue is still in PyArg_ParseTuple with an ugly comment as well there. It may be that users of it are less now though. And yes, you are right, the name "cache" is a misnomer, since it is persistent.

@mattip mattip requested a review from eric-wieser August 13, 2020 08:36
Comment on lines 633 to 639
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very easy to fix - just make key the tuple PyTuple_Pack(2, PyLong_FromVoidPtr((void*)obj), PyBool_FromLong(f_contiguous)) (with error checking).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it does mean you have to delete two elements on every single array deallocation, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it trades of something that should never happen, with a small speed penalty on every deallocation right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I guess that is not true, it might happen, but its unlikely common. The other alternative is to always probe 2 buffers into the past for "reuse", and actually, I guess only when the array is both C- and F-contiguous, which is quick to check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think using a (pointer, order) key is two expensive, then probably best to add that justification to a source comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue was that I had neither of those ideas when writing that comment. And I do hate 98% solutions. I will have a quick look whether inspecting two list elements is easy, if not, I will expand the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I pushed a fix, too often just staring at it and wondering takes more time than the fix :(. I have added a test, which trips if run with pytest-leaks. I have confirmed that it fails before and passes after.

Copy link
Member

@eric-wieser eric-wieser Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside for a future PR, I think we could get a reasonable speedup for the cache hit case by using Py_capsure objects instead of PyLong_AsVoidPtr here - converting pointers to big integers and back seems silly when we could just keep the pointer around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly (and probably nicer in any case). I would prefer the approach in: gh-16938, but I don't have the mental capacity to figure out how annoying growing the struct is (I doubt its very) and mainly what could be done to reduce the chance of issues.

@charris
Copy link
Member

charris commented Sep 2, 2020

@seberg @eric-wieser There is a push to release a 1.19.2, is this PR ready?

@seberg
Copy link
Member Author

seberg commented Sep 4, 2020

Hmm, yeah, I think it is, I am not sure the last commit was ever reviewed though.

@charris
Copy link
Member

charris commented Sep 8, 2020

Looks like this is probably ready, but I'm going to push off to 1.19.3 for backport. We will need to make that release for python 3.9 and (hopefully) the Windows 10 version 2004 fix.

@charris charris added this to the 1.19.3 release milestone Sep 8, 2020
@seberg
Copy link
Member Author

seberg commented Sep 11, 2020

I pushed a tiny bit of code cleanup in preparation of a fix for gh-17294.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this fail without relaxed strides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array will be only C contiguous, so you can't request an f-contiguous buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks. Am I right in thinking code-coverage skips this test, based on the review comment it made in the diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems strange, unfortunately the comment is gone now due to the small comment change, so have to see which branch is uncovered and whether it should be later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the uncovered pass was within the else block of #if NPY_RELAXED_STRIDES_CHECKING? That can't be hit, since codecov should be running in the default setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the codecoverage didn't lie, the code got messed up in a cleanup try... The test was missing the strides check to notice it :(.

anyway, fixed now...

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay reviewing this - I think the code looks good, just want to check I understand the test

@charris
Copy link
Member

charris commented Oct 19, 2020

What is the status of this? The CircleCI failures are fixed and master and can be ignored.

When relaxed strides is active (and has an effect), we recalculate
the strides to export "clean" strides in the buffer interface.
(Python and probably some other exporters expect this, i.e. NumPy
has fully switched to and embraced relaxed strides, but the buffer
interface at large probably not.)

The place where "fixing" the strides occured however meant that
when the strides are fixed, the old, cached buffer-info was not
reused when it should have been reused.

This moves the "fixing" logic so that reuse will occur.  It leaves
one issue open in that an array shaped e.g. `(1, 10)` is both
C- and F-contiguous. Thus, if it is exported as C-contiguous and
then as F-contiguous, and then *again* as C-contiguous, this will
work, but the last export will compare to the export as an F-contig
buffer and thus still leak memory.

Address numpygh-16934 (but does leave a small hole)
Exporting these multiple times alternating would previously cause
a new buffer-info to be created each time.
@mattip mattip merged commit 84626f2 into numpy:master Oct 22, 2020
@mattip
Copy link
Member

mattip commented Oct 22, 2020

Thanks @seberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants