BUG: Fix memory leak of buffer-info cache due to relaxed strides#16936
BUG: Fix memory leak of buffer-info cache due to relaxed strides#16936mattip merged 4 commits intonumpy:masterfrom
Conversation
numpy/core/tests/test_multiarray.py
Outdated
There was a problem hiding this comment.
I have confirmed that this slightly strange modification of the test does cause it to fail when using pytest-leaks.
EDIT: Not with this fix of course.
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
Something for a follow-up - this is overly strict, if the user did not request a contiguous buffer at all we still construct these replacement strides.
There was a problem hiding this comment.
IIRC there was a reason for that. Maybe Cython may sometimes requested a buffer without specifying the flag, but then checking for it anyway later-on (potentially even raising an error). I somewhat expects cython caught up over time with our interpretation of relaxed strides, but I am not 100% sure.
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
If a new buffer info is always constructed anyway, what's the benefit of having a cache at all? Also, it's a list but I only ever see the last item in the list referenced. Couldn't it be replaced with a single object per array? Then there is no memory growth at all.
There was a problem hiding this comment.
Yeah, it would be nice to skip part of that sometimes... and would be a small speed enhancement. But, we have to reuse the old one, so that you do not leak memory and we need to hold on to it to be able to free it again.
The problem is that we can only free these when the array is deleted, which is the reason for your memory leak. This is a fundamental issue around the buffer protocol, which actually does also provide the answer to the issue (by providing a free function). But due to some backcompat issues, we can't just use that...
Times change, and the need for that backcompat probably goes down. Maybe one more thing that would be nice to try out if we had a major release, I doubt that it would create a big issue in most cases.
There was a problem hiding this comment.
Sorry, the important part is that, even though nobody likes that, arrays are currently mutable both in shape/strides and dtype... So we can't just stick to a single exported buffer object :(.
There was a problem hiding this comment.
So to sum up, it isn't a cache in the usual sense but instead a store of all buffer objects.
I see this was 577dbbd which is 11 years old. If the fix 11 years old, perhaps that PyArgs_ParseTuple issue has been fixed in the mean time?
There was a problem hiding this comment.
No, that issue is still in PyArg_ParseTuple with an ugly comment as well there. It may be that users of it are less now though. And yes, you are right, the name "cache" is a misnomer, since it is persistent.
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
This seems very easy to fix - just make key the tuple PyTuple_Pack(2, PyLong_FromVoidPtr((void*)obj), PyBool_FromLong(f_contiguous)) (with error checking).
There was a problem hiding this comment.
Hmmm, it does mean you have to delete two elements on every single array deallocation, though.
There was a problem hiding this comment.
So it trades of something that should never happen, with a small speed penalty on every deallocation right now.
There was a problem hiding this comment.
Sorry, I guess that is not true, it might happen, but its unlikely common. The other alternative is to always probe 2 buffers into the past for "reuse", and actually, I guess only when the array is both C- and F-contiguous, which is quick to check.
There was a problem hiding this comment.
If you think using a (pointer, order) key is two expensive, then probably best to add that justification to a source comment.
There was a problem hiding this comment.
I think the issue was that I had neither of those ideas when writing that comment. And I do hate 98% solutions. I will have a quick look whether inspecting two list elements is easy, if not, I will expand the comment.
There was a problem hiding this comment.
OK, I pushed a fix, too often just staring at it and wondering takes more time than the fix :(. I have added a test, which trips if run with pytest-leaks. I have confirmed that it fails before and passes after.
numpy/core/src/multiarray/buffer.c
Outdated
There was a problem hiding this comment.
As an aside for a future PR, I think we could get a reasonable speedup for the cache hit case by using Py_capsure objects instead of PyLong_AsVoidPtr here - converting pointers to big integers and back seems silly when we could just keep the pointer around.
There was a problem hiding this comment.
Possibly (and probably nicer in any case). I would prefer the approach in: gh-16938, but I don't have the mental capacity to figure out how annoying growing the struct is (I doubt its very) and mainly what could be done to reduce the chance of issues.
|
@seberg @eric-wieser There is a push to release a 1.19.2, is this PR ready? |
|
Hmm, yeah, I think it is, I am not sure the last commit was ever reviewed though. |
|
Looks like this is probably ready, but I'm going to push off to 1.19.3 for backport. We will need to make that release for python 3.9 and (hopefully) the Windows 10 version 2004 fix. |
|
I pushed a tiny bit of code cleanup in preparation of a fix for gh-17294. |
numpy/core/tests/test_multiarray.py
Outdated
There was a problem hiding this comment.
why does this fail without relaxed strides?
There was a problem hiding this comment.
The array will be only C contiguous, so you can't request an f-contiguous buffer.
There was a problem hiding this comment.
Makes sense, thanks. Am I right in thinking code-coverage skips this test, based on the review comment it made in the diff?
There was a problem hiding this comment.
That seems strange, unfortunately the comment is gone now due to the small comment change, so have to see which branch is uncovered and whether it should be later.
There was a problem hiding this comment.
Maybe the uncovered pass was within the else block of #if NPY_RELAXED_STRIDES_CHECKING? That can't be hit, since codecov should be running in the default setup.
There was a problem hiding this comment.
Well, the codecoverage didn't lie, the code got messed up in a cleanup try... The test was missing the strides check to notice it :(.
anyway, fixed now...
eric-wieser
left a comment
There was a problem hiding this comment.
Sorry for the delay reviewing this - I think the code looks good, just want to check I understand the test
|
What is the status of this? The CircleCI failures are fixed and master and can be ignored. |
When relaxed strides is active (and has an effect), we recalculate the strides to export "clean" strides in the buffer interface. (Python and probably some other exporters expect this, i.e. NumPy has fully switched to and embraced relaxed strides, but the buffer interface at large probably not.) The place where "fixing" the strides occured however meant that when the strides are fixed, the old, cached buffer-info was not reused when it should have been reused. This moves the "fixing" logic so that reuse will occur. It leaves one issue open in that an array shaped e.g. `(1, 10)` is both C- and F-contiguous. Thus, if it is exported as C-contiguous and then as F-contiguous, and then *again* as C-contiguous, this will work, but the last export will compare to the export as an F-contig buffer and thus still leak memory. Address numpygh-16934 (but does leave a small hole)
Exporting these multiple times alternating would previously cause a new buffer-info to be created each time.
|
Thanks @seberg |
When relaxed strides is active (and has an effect), we recalculate
the strides to export "clean" strides in the buffer interface.
(Python and probably some other exporters expect this, i.e. NumPy
has fully switched to and embraced relaxed strides, but the buffer
interface at large probably not.)
The place where "fixing" the strides occured however meant that
when the strides are fixed, the old, cached buffer-info was not
reused when it should have been reused.
This moves the "fixing" logic so that reuse will occur. It leaves
one issue open in that an array shaped e.g.
(1, 10)is bothC- and F-contiguous. Thus, if it is exported as C-contiguous and
then as F-contiguous, and then again as C-contiguous, this will
work, but the last export will compare to the export as an F-contig
buffer and thus still leak memory.
Address gh-16934 (but does leave a small hole)
@charris marking this as backport candidate. But, that code got some small changes, so it may be the diff does not apply (although I think it should).