perf: further avoid using proto-plus wrapper when unmarshalling entities#190
perf: further avoid using proto-plus wrapper when unmarshalling entities#190
Conversation
This is actually a big win: in my comparison, we go from 3x to 2x slower than the older/faster version.
Always take the raw protobuf message, rather than the proto-plus wrapper. This is another big win: we are back to within a few percent of the older/faster version on my comparison test.
craiglabenz
left a comment
There was a problem hiding this comment.
Nice! After a few read-throughs of the PR, I think I have pieced together how/why this improves performance. This LGTM pending your confirmation of my understanding (which is only important in that, if I am still misunderstanding, then I have not actually reviewed this PR as much as I think 😆).
| return value_pb.meaning | ||
|
|
||
| return meaning | ||
| return None |
There was a problem hiding this comment.
To make sure I'm following, do the edits to _get_meaning() impact performance, or is this just for readability?
There was a problem hiding this comment.
Not a big hit: the three-item getattr can be much faster than a try: ... except AttributeError:, but likely not the conditional assignment. OTOH, since in later commits we no longer risk handing the proto-plus wrapper into _get_meaning, we could likely pull that getattr out. The other changes to _get_meaning are micro-optimizaitons, which neither helped nor hurt on my comparison test.
google/cloud/datastore/helpers.py
Outdated
| values = ( | ||
| value_pb._pb.array_value.values | ||
| if hasattr(value_pb, "_pb") | ||
| else value_pb.array_value.values | ||
| ) | ||
| # Use 'raw' protobuf message if possible | ||
| value_pb = getattr(value_pb, "_pb", value_pb) |
There was a problem hiding this comment.
Changes like this (there are a few in the PR) seem be a pure refactor of the old implementation, unless I am missing something that further helps performance?
There was a problem hiding this comment.
Three-arg getattr does the test in C, versus in Python's own branch machinery. At least in some cases, it can be a speedup.
| :returns: The entity derived from the protobuf. | ||
| """ | ||
| if not isinstance(pb, entity_pb2.Entity): | ||
| proto_pb = entity_pb2.Entity.wrap(pb) |
There was a problem hiding this comment.
Is the removal of this line and focusing entirely on raw protobuf objects what delivers the final performance gains?
There was a problem hiding this comment.
Yes, indeed. The proto-plus wrappers are an enormous perf-suck. Most of the win in this PR is avoiding touching them anywhere except at the outermost entry point, and then unwrapping there ASAP.
There was a problem hiding this comment.
I had not pieced together that we could avoid .wrap() altogether. The last time I was in here, I'd mistakenly assumed calling that (which is at least the least-slow proto-plus entrypoint) was still required.
Very nice!
🤖 I have created a release \*beep\* \*boop\* --- ### [2.1.4](https://www.github.com/googleapis/python-datastore/compare/v2.1.3...v2.1.4) (2021-07-09) ### Performance Improvements * further avoid using proto-plus wrapper when unmarshalling entities ([#190](https://www.github.com/googleapis/python-datastore/issues/190)) ([d0481bf](https://www.github.com/googleapis/python-datastore/commit/d0481bf8caa84a829808e7f512fda8709f38d0cc)) ### Documentation * omit mention of Python 2.7 in 'CONTRIBUTING.rst' ([#1127](https://www.github.com/googleapis/python-datastore/issues/1127)) ([#181](https://www.github.com/googleapis/python-datastore/issues/181)) ([6efde70](https://www.github.com/googleapis/python-datastore/commit/6efde70db751bf708091b24a932ab8571bd981a6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Follow-on to @craiglabenz's work in PR #155. These tweaks combine to get performance on my comparison with 1.5.3 script to within a few percentage points of the 1.5.3 implementation.
Closes #150.