Skip to content

SIMD: Add sum intrinsics for float/double.#17681

Merged
mattip merged 3 commits intonumpy:masterfrom
Qiyu8:sum_intrinsic
Nov 3, 2020
Merged

SIMD: Add sum intrinsics for float/double.#17681
mattip merged 3 commits intonumpy:masterfrom
Qiyu8:sum_intrinsic

Conversation

@Qiyu8
Copy link
Member

@Qiyu8 Qiyu8 commented Oct 30, 2020

The origin PR is too large to review, So I will split to several small PRs, Here is the sum intrinsics that was about to used in einsum, The intrinsics has been fully discussed and tested.

@mattip
Copy link
Member

mattip commented Nov 2, 2020

@seiko2plus looks ok?

Comment on lines 122 to 134

// Horizontal add: Calculates the sum of all vector elements.
NPY_FINLINE float npyv_sum_f32(float32x4_t a)
{
float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));
return vget_lane_f32(vpadd_f32(r, r), 0);
}
#ifdef __aarch64__
NPY_FINLINE double npyv_sum_f64(float64x2_t a)
{
return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0);
}
#endif
Copy link
Member

@seiko2plus seiko2plus Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Horizontal add: Calculates the sum of all vector elements.
NPY_FINLINE float npyv_sum_f32(float32x4_t a)
{
float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));
return vget_lane_f32(vpadd_f32(r, r), 0);
}
#ifdef __aarch64__
NPY_FINLINE double npyv_sum_f64(float64x2_t a)
{
return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0);
}
#endif
// Horizontal add: Calculates the sum of all vector elements.
#if NPY_SIMD_F64
#define npyv_sum_f32 vaddvq_f32
#define npyv_sum_f64 vaddvq_f64
#else
NPY_FINLINE float npyv_sum_f32(npyv_f32 a)
{
float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));
return vget_lane_f32(vpadd_f32(r, r), 0);
}
#endif

EDIT: bring vpadd_f32 again as it was, It should perform better than extracting two scalars.

Comment on lines 122 to 126
NPY_FINLINE float npyv_sum_f32(npyv_f32 a)
{
return vec_extract(a, 0) + vec_extract(a, 1) +
vec_extract(a, 2) + vec_extract(a, 3);
}
Copy link
Member

@seiko2plus seiko2plus Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NPY_FINLINE float npyv_sum_f32(npyv_f32 a)
{
return vec_extract(a, 0) + vec_extract(a, 1) +
vec_extract(a, 2) + vec_extract(a, 3);
}
NPY_FINLINE float npyv_sum_f32(npyv_f32 a)
{
npyv_f32 sum = vec_add(a, npyv_combineh_f32(a, a));
return vec_extract(sum, 0) + vec_extract(sum, 1);
}

EDIT: my bad, fix swaping the highest half

#endif // !NPY_HAVE_FMA3
#endif // _NPY_SIMD_AVX2_ARITHMETIC_H

// Horizontal add: Calculates the sum of all vector elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, move the intrinsics inside the header guard _NPY_SIMD_AVX2_ARITHMETIC_H,
same thing for other SIMD extensions.

@seiko2plus
Copy link
Member

seiko2plus commented Nov 2, 2020

I prefer adding a testing unit for any new intrinsics to keep things under control.
@Qiyu8 please could you add a testing case for them?
If yes, then you have to add new methods for the new intrinsics within _simd.dispatch.c.src. for example:

// try to follow the current defentions in the way of sorting the source
// this how you should define the new python methods
SIMD_IMPL_INTRIN_1(sum_f32, f32, vf32)
#if NPY_SIMD_F64
SIMD_IMPL_INTRIN_1(sum_f64, f64, vf64)
#endif
// and that how we attach them
SIMD_INTRIN_DEF(sum_f32)
#if NPY_SIMD_F64
SIMD_INTRIN_DEF(sum_f64)
#endif

Once you get done bringing the new methods to _simd module, go to test_simd.py and add testing cases for them. for example:

def test_reduce(self):
    data = self._data()
    vdata = self.load(data)
    # reduce sum
    data_sum = sum(data)
    vsum = self.sum(vdata)
    assert vsum == data_sum

You can also have some fun and try to use _simd module as a tool for designing and discovering, for example

# 1- bring the baseline via dict `targets` or attribute `baseline`
from numpy.core._simd import targets, baseline
npyv = targets["baseline"] # or baseline
# you can also dump `targets` to get the supported SIMD extensions
# by default build option `--simd-test` contains the most common SIMD extentions
if not npyv.simd: # equivalent to C def `NPY_SIMD`
    print((
        "How that possible? changed the default build settings?\n"
        "maybe you running it under armhf, then get targets['NEON']"
    ))
    return

a = npyv.load_f32(range(npyv.nlanes_f32))
print("sum of f32", npyv.sum_f32(a))

if npyv.simd_f64: # equivalent to C def `NPY_SIMD_F64`
    b = npyv.load_f64(range(npyv.nlanes_f64))
    print("sum of f64", npyv.sum_f64(b))

EDIT: improve the examples

@charris charris added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Nov 2, 2020
@Qiyu8
Copy link
Member Author

Qiyu8 commented Nov 3, 2020

@seiko2plus Thanks for your detailed recommendations, The _simd.dispatch.c.src provides perfect functional test interface for universal intrinsics, we should write a doc such as how to write a simd test because the additional complexity compared to regular apis.

Copy link
Member

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you.


data_sum = sum(data)
vsum = self.sum(vdata)
assert vsum == data_sum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@mattip mattip merged commit 671e8a0 into numpy:master Nov 3, 2020
@mattip
Copy link
Member

mattip commented Nov 3, 2020

Thanks @Qiyu8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants