ENH: Use syrk to compute certain dot products more quickly and accurately#6932
ENH: Use syrk to compute certain dot products more quickly and accurately#6932njsmith merged 4 commits intonumpy:masterfrom
syrk to compute certain dot products more quickly and accurately#6932Conversation
|
I think something might be wrong with the Also, I have tried to check to see if an array is a transpose view using an ad hoc mechanism I came up with. If somebody has a better way to check this, I am open to suggestions. |
b41860e to
95049a0
Compare
|
I tried building this with OpenBLAS locally and am not seeing this same error. Also, I am seeing an appropriate speed up here on transposed arrays (e.g. ~2x faster). |
95049a0 to
520e209
Compare
|
Ok, I still can't say that this is totally correct, but it doesn't seem to be complaining about the |
syrk to compute certain dot products more quickly and accuratelysyrk to compute certain dot products more quickly and accurately
|
Pinging: @njsmith @charris @argriffing |
|
|
There was a problem hiding this comment.
Ah, yeah, this check is definitely not right. The base pointer doesn't have any semantics.
I think the correct check would be something like: same data pointer, and shapes are transposes of each other, and one is C contiguous and the other is F contiguous (the Trans checks are probably sufficient for the last part).
There was a problem hiding this comment.
Sorry, not super familiar with the NumPy C-API. So, this was my first attempt doing something significant with it. I'll think about how to rework it.
|
@jakirkham Trying to figure out when one matrix is the transpose of the other is a bit tricky. Furthermore, these days we tend to design functions to operate on stacks of matrices. On account of that, I think it would be better to have explicit functions that receive a single argument and perform Other possibilities are two argument functions for |
|
@charris, regarding the first case (i.e. the one argument form) I think I still agree with you, which is why I had mentioned something similar earlier. ( #6794 (comment) ) That being said, it wasn't and still isn't clear to me how I was going to make traction on that problem. As the initial problem proposed by @njsmith seemed a bit simpler to get traction, I tried it here. Adding a new function or function(s) to do this may be better in the long run. I am aware of these two argument problems (including this |
|
R names those operations I think that's an orthogonal issue though. I don't see any reason not to optimize |
520e209 to
c52facb
Compare
This I definitely agree with, which is why I took the time to write this code at all. Regarding the NumPy optimization stuff, I believe both of you know more than me in this area so I'll let you guys hash it out. As for the checks, I think I have fixed them, but they probably should be fused into one conditional to reduce the rechecking. |
|
Might be good to add tests for the trickier cases like, |
c52facb to
6e4950a
Compare
|
Ok, sure that's a good idea. I'll try to work on the tests tomorrow. I've tried to combine the two checks a bit so that they are not repeated for each case. |
cb441e8 to
c4dff61
Compare
|
@njsmith, I've added some tests that you have suggested. They have checked out locally. Travis should be done soon. If there are any more you can think of, please let me know. |
|
Travis checks out. |
|
This is looking good to me. (@charris: do you still have any objections?) @jakirkham: One last thing: this is a pretty awesome user-visible feature, so we should advertise it in the release notes. Could you add a short note to |
|
...I guess if you're feeling inspired you could also usefully add some benchmarks to the |
Absolutely, I added a note under new features. Though I could see it going under improvements. Please let me know if I need to move it or change wording.
Sure, I have no problem doing this, but I may need a pointer on how to go about doing this. In the worst case, I can copy and tweak existing benchmarks to serve our ends here. |
44b3077 to
0197427
Compare
|
Updated the benchmarks so they would be sorted in the output next to each other to make it easy to compare. |
0197427 to
0576eb1
Compare
… accurately in certain special cases.
… @ A` and `A @ A.T`.
…ata, which should be optimized, versus the same configuration without shared data.
0576eb1 to
8d8a74d
Compare
|
Please let me know if there is anything else needed. Otherwise, I am assuming this is complete. Also, I have tried to rebase against master a few times, but there seem to a bunch of updates going in. Instead of clogging up the build queue with rebases of this PR, I am going to let it sit. If you would like to merge and want me to rebase beforehand, feel free to give me a nudge and I will do so. |
|
FYI as a general rule, there's no need to rebase at all unless master has changed in a way that breaks your patch, or else you want to rewrite history for some other reason (e.g. fix commit messages or squash commits together). Anyway, I think this is ready to go -- thanks @jakirkham! |
ENH: Use `syrk` to compute certain dot products more quickly and accurately
|
Ah ok, I think in previous PRs that I have done to numpy I had been encouraged to rebase on master if I was out of date. I was assuming this was to keep the git graph from being a giant indecipherable mess. If that is not necessary, I will avoid doing rebases strictly to move commits onto master unless explicitly asked. Cool, thanks for your help in guiding me in this effort @njsmith. Also, thank you everyone for your suggestions and feedback. If anything else comes up with regards to this, feel free to to let me know. |
|
Benchmarks show the 2x improvement now (for |
|
Oh wow, that's super cool. I didn't realize you had an awesome benchmark viewer for NumPy. Also, it is good to see it checks out. Thanks for sharing @rgommers. |
|
Yes we should advertise that better. I guess @pv didn't advertise because it's on his personal site now. But it's pretty cool indeed. |
|
For those who plan to use this on Mac with OpenBLAS, something to be aware of ( OpenMathLib/OpenBLAS#730 ). This does not affect Linux. |
|
Turns out this actually optimizes |
Fixes #6794
Related: #6930
Deals with the special cases of
a.T @ aanda @ a.Tusingsyrkinstead ofgemm.