-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Closed
Labels
33 - QuestionQuestion about NumPy usage or developmentQuestion about NumPy usage or development57 - Close?Issues which may be closable unless discussion continuedIssues which may be closable unless discussion continuedcomponent: numpy.einsum
Description
I've got two n-by-3 arrays and I'd like to find the fastest method of computing the row-by-row dot products. einsum is faster than multiplying and summing up, so I compared row-dot-products and column-dot-products with the contiguous transposed arrays. The results:

Apparently, column dot products of 3-by-n matrices are never worse, and up to 2.3 times as fast as row dot products of n-by-3 matrices.
Any idea why that is? We're perhaps venturing into the area of cache hits/misses here.
Code to reproduce the plot:
import numpy
import perfplot
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
aT = numpy.ascontiguousarray(a.T)
bT = numpy.ascontiguousarray(b.T)
return (a, b), (aT, bT)
perfplot.save(
"rel.png",
setup=setup,
n_range=[2 ** k for k in range(1, 26)],
kernels=[
lambda data: numpy.einsum("ij, ij->i", data[0][0], data[0][1]),
lambda data: numpy.einsum("ij, ij->j", data[1][0], data[1][1]),
],
labels=["einsum", "einsum.T"],
logx=True,
xlabel="len(a), len(b)",
relative_to=1,
title=f"numpy {numpy.__version__}",
)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
33 - QuestionQuestion about NumPy usage or developmentQuestion about NumPy usage or development57 - Close?Issues which may be closable unless discussion continuedIssues which may be closable unless discussion continuedcomponent: numpy.einsum