Skip to content

[question] column-wise dot products faster than row-wise dot products. why? #14979

@nschloe

Description

@nschloe

I've got two n-by-3 arrays and I'd like to find the fastest method of computing the row-by-row dot products. einsum is faster than multiplying and summing up, so I compared row-dot-products and column-dot-products with the contiguous transposed arrays. The results:
rel

Apparently, column dot products of 3-by-n matrices are never worse, and up to 2.3 times as fast as row dot products of n-by-3 matrices.

Any idea why that is? We're perhaps venturing into the area of cache hits/misses here.

Code to reproduce the plot:

import numpy
import perfplot


def setup(n):
    a = numpy.random.rand(n, 3)
    b = numpy.random.rand(n, 3)
    aT = numpy.ascontiguousarray(a.T)
    bT = numpy.ascontiguousarray(b.T)
    return (a, b), (aT, bT)


perfplot.save(
    "rel.png",
    setup=setup,
    n_range=[2 ** k for k in range(1, 26)],
    kernels=[
        lambda data: numpy.einsum("ij, ij->i", data[0][0], data[0][1]),
        lambda data: numpy.einsum("ij, ij->j", data[1][0], data[1][1]),
    ],
    labels=["einsum", "einsum.T"],
    logx=True,
    xlabel="len(a), len(b)",
    relative_to=1,
    title=f"numpy {numpy.__version__}",
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    33 - QuestionQuestion about NumPy usage or development57 - Close?Issues which may be closable unless discussion continuedcomponent: numpy.einsum

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions